-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some more tests #33
base: rdbms-sink
Are you sure you want to change the base?
Conversation
The statement to set the representative position of a way was having weird issues on travis, so I refactored it. Also, I renamed the a column called “nodes” to “node_ids” to avoid calling a table and column the same thing.
Codecov Report
@@ Coverage Diff @@
## rdbms-sink #33 +/- ##
=============================================
Coverage ? 51.94%
Complexity ? 277
=============================================
Files ? 37
Lines ? 2058
Branches ? 233
=============================================
Hits ? 1069
Misses ? 910
Partials ? 79
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good as a first step toward expanding testing of these libraries.
@@ -24,6 +29,9 @@ script: | | |||
mvn clean verify --settings maven-settings.xml | |||
fi | |||
|
|||
after_success: | |||
- bash <(curl -s https://codecov.io/bash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is running arbitrary code off a remote URL. This raises a red flag for me, especially on a repo that contains signing keys. Can we copy the script into our repo so we know what we're running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script at the url is pretty large and may change over time. I think this request is a bit excessive because that would also mean that we should examine and put every single maven dependency in our repo because that code could also echo environment variables or send it to a remote server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True enough but Maven artifacts are signed. This is taking arbitrary code from a URL with no verification, and anyone can stick a proxy in between you and the server. It's always debatable how paranoid one needs to be, but the recent trend in piping the contents of URLs into bash is really a security hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it's also unlikely we'd ever know if someone got the keys, which then compromises the signing system of Maven. The security problem is contagious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... but it's an https URL so there's a certificate for a particular domain involved. Maybe it's not so insecure then...
pom.xml
Outdated
@@ -135,6 +135,25 @@ | |||
</execution> | |||
</executions> | |||
</plugin> | |||
<plugin> | |||
<groupId>org.jacoco</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add XML comments to the POM briefly explaining what this is. I know all the other dependencies don't have this but we should start adding it. It makes it less painful to browse through in the future.
@@ -246,7 +244,10 @@ public void writeEnd() throws IOException { | |||
// statement.execute("create index on nodes(lat, lon)"); | |||
LOG.info("Assigning representative coordinates to ways..."); | |||
statement.execute("alter table ways add column rep_lat float(9), add column rep_lon float(9)"); | |||
statement.execute("update ways set (rep_lat, rep_lon) = (select lat, lon from nodes where nodes.node_id = nodes[array_length(nodes, 1)/2])"); | |||
statement.execute("UPDATE ways " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I don't like the peculiar SQL convention of capitalizing keywords, especially since it's not checked by the parser so doesn't add any semantic safety. It mostly just achieves a cyber 1980s look. It's of course not going to directly hurt anything to have some statements using one convention and some using another, but just for consistency I'd rather do all lower case. This is a pretty superficial comment though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll change it to lowercase
@@ -71,16 +66,14 @@ public static void main(String[] args) { | |||
try { | |||
httpServer.start(); | |||
LOG.info("VEX server running."); | |||
Thread.currentThread().join(); | |||
// updateThread.interrupt(); | |||
//Thread.currentThread().join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you take out the thread join and the server shutdown? It will probably work without it but it's more clean to properly shut down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because it was causing the VanillaExtractTest to hang when it tried to start the server. The test would not continue after the server started up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be changing the flow of our main method in order to make a test easier to write. We need to establish clearly what we're losing by not doing the join and clean shutdown -- this doesn't seem like the kind of thing that should just be changed without understanding the consequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a java expert so I don't understand what the thread joining does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread.currentThread().join();
is kind of strange. It just blocks the current thread forever, waiting for itself to end. It only makes sense because this is intended to be run as a command line main method. Once the server thread starts, we don't want the main thread to shut down and end the whole process. This is another reason that if we want to test this logic, it should be factored out and called slightly differently from a test and from a main method.
.travis.yml
Outdated
@@ -8,6 +8,11 @@ jdk: | |||
before_install: | |||
- openssl aes-256-cbc -K $encrypted_3ec7d78ab93d_key -iv $encrypted_3ec7d78ab93d_iv -in maven-artifact-signing-key.asc.enc -out maven-artifact-signing-key.asc -d | |||
- gpg --import --batch maven-artifact-signing-key.asc | |||
- psql -U postgres -c 'CREATE DATABASE osm_lib_test;' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add comments to this file explaining why all these lines are here. Not just the database setup for the tests but the encrypted variables etc.
import static org.hamcrest.MatcherAssert.assertThat; | ||
import static org.hamcrest.Matchers.is; | ||
|
||
public class PostgresTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of policy all new classes and methods should have comments. This should contain a human-readable explanation of what is being tested.
} | ||
} | ||
|
||
class TableTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clear data types.
}; | ||
|
||
// perform file to postgres load | ||
PostgresSink.main(args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it kind of strange to be passing string arguments into a main method in a test. Main methods are really only intended to be used as a command line interface. Ideally we'd pull the "load a feed" functionality out of the main method and call the same functionality from the main and the test. But this will do for now.
import static org.hamcrest.Matchers.*; | ||
|
||
public class VanillaExtractTest { | ||
static final String TEST_FILE = "./src/test/resources/porto_portugal.osm.pbf"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resources are associated with a package (in this case com.conveyal.osmlib) so it should be possible to put them in the appropriate place (/src/test/resources/com/conveyal/osmlib/porto_portugal.osm.pbf) then open them as a stream without giving full paths. That would require us to have load methods that accepted something other than a string path though.
|
||
// assert that the response is not an error response | ||
assertThat(response, not(containsString("URI format"))); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also attempt to decode the data that comes back as a PBF file and make sure it contains anything meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some more logic that tests this.
connection = DriverManager.getConnection(pgUrl); | ||
} catch (SQLException e) { | ||
e.printStackTrace(); | ||
LOG.error("Error creating new database!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log message does not correspond to the purpose of the method.
connection.prepareStatement(statement).execute(); | ||
} catch (SQLException e) { | ||
e.printStackTrace(); | ||
LOG.error("Error creating new database!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log message does not correspond to the purpose of the method.
* An MD5 can be represented as 32 hex digits so we don't save much length, but we do make it entirely alphabetical. | ||
* log base 2 of 26 is about 4.7, so each character represents about 4.7 bits of randomness. | ||
* | ||
* The question remains of whether we need globally unique IDs or just application-unique IDs. The downside of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code appears to be copied out of gtfs-lib. Since these are separate libraries I guess we have no choice but to duplicate the code. But the javadoc comments contain references to "feeds" and "older databases" which only make sense in the gtfs-lib context.
I'm also wondering why we need random database names - weren't you doing it before with a single predetermined name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using a single name, but I want to avoid that because I want to run each test on it's own database so that tests won't depend on each other. I've thought of a different idea for creating unique IDs and will implement that instead.
Other than the concerns about curl | bash (for which I don't think there's a much better solution) this looks pretty good. Should we merge and continue to add more tests later? |
@abyrd This PR also looks like it is awaiting your approval. |
Add tests for the VanillaExtractServer and PostgresSQL.
Also, calculate code coverage and send results to codecov.io.