You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that here you should create a new user called ubuntu (I used my own user and had to modify various scripts and config files which is described below)
I needed to change the DocumentRoot to match the actual location where the data was installed. In my case the sources directory was /home/pjm/sources instead of /home/ubuntu/sources.
Ideally there should have been a new user called ubuntu but I didnt know about this until I was too far into the process.
cd ~/sources
tar -xvzf libiconv-1.11.tar.gz
cd libiconv-1.11
./configure --prefix=/usr/local/libiconv
sudo make install
sudo ln -s /usr/local/libiconv/lib/ /usr/lib/
cd ~/sources
git clone git://
cd osm2pgsql/
sed -i 's/version = BZ2_bzlibVersion();//' configure
sed -i 's/version = zlibVersion();//' configure
sudo make install
cd ..
I started the next set of commands in a new window...
cd ~/sources
git clone git://
cd boilerpipe/boilerpipe-core/
cd src
javac -cp ../dist/boilerpipe-1.1-dev.jar
cd ~/sources/dstk/
psql -U postgres -d reversegeo -f sql/loadukpostcodes.sql
sudo bash -c 'echo "
<VirtualHost :8000>
DocumentRoot /home/pjm/sources/dstk/public
RewriteEngine On
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.)$$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.)$$1 [R=301,L]
# We have an internal TwoFishes server running on port 8081, so redirect
# requests that look like they belong to its API
ProxyPass /twofishes http://localhost:8081
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews
Header set Access-Control-Allow-Origin ""
Header set Cache-Control "max-age=86400"
This is my version of ec2setup.txt that I modified to work on my own home grown Ubuntu 12.04 LTS instance.
Start with AMI # ami-3fec7956 (Ubuntu 12.04), 32GB
(ec2-run-instances ami-3fec7956 -t m1.large --region us-east-1 -z us-east-1d --block-device-mapping /dev/sda1=:32:false -k )
sudo apt-add-repository -y ppa:olivier-berten/geo
sudo add-apt-repository -y ppa:webupd8team/java
sudo aptitude update
sudo aptitude safe-upgrade -y
sudo aptitude full-upgrade -y
sudo aptitude install -y build-essential apache2 apache2.2-common apache2-mpm-prefork apache2-utils libexpat1 ssl-cert postgresql libpq-dev ruby1.8-dev ruby1.8 ri1.8 rdoc1.8 irb1.8 libreadline-ruby1.8 libruby1.8 libopenssl-ruby sqlite3 libsqlite3-ruby1.8 git-core libcurl4-openssl-dev apache2-prefork-dev libapr1-dev libaprutil1-dev subversion postgresql-9.1-postgis autoconf libtool libxml2-dev libbz2-1.0 libbz2-dev libgeos-dev proj-bin libproj-dev ocropus pdftohtml catdoc unzip ant openjdk-6-jdk lftp php5-cli rubygems flex postgresql-server-dev-9.1 proj libjson0-dev xsltproc docbook-xsl docbook-mathml gettext postgresql-contrib-9.1 pgadmin3 python-software-properties bison dos2unix
sudo aptitude install -y oracle-java7-installer
sudo aptitude install -y libgdal-dev
sudo aptitude install -y libgeos++-dev
sudo bash -c 'echo "/usr/lib/jvm/java-7-oracle/jre/lib/amd64/server" > /etc/'
sudo ldconfig
Note that here you should create a new user called ubuntu (I used my own user and had to modify various scripts and config files which is described below)
mkdir ~/sources
cd ~/sources
tar xfvz postgis-2.0.3.tar.gz
cd postgis-2.0.3
./configure --with-gui
./configure --with-gui --without-topology
If the GEO version is incorrect then perform the following steps:
tar xjf geos-3.3.8.tar.bz2
cd geos-3.3.8
sudo make install
cd ~/sources/postgis-2.0.3
./configure --with-gui
Note that the above steps didnt work. It appears that there should be a way to setup the load libraries correctly but I gave up.
otherwise continue here:
sudo make install
sudo ldconfig
sudo make comments-install
sudo sed -i "s/ident/trust/" /etc/postgresql/9.1/main/pg_hba.conf
sudo sed -i "s/md5/trust/" /etc/postgresql/9.1/main/pg_hba.conf
sudo sed -i "s/peer/trust/" /etc/postgresql/9.1/main/pg_hba.conf
sudo /etc/init.d/postgresql restart
createdb -U postgres geodict
sudo -u postgres createdb template_postgis
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/postgis.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/spatial_ref_sys.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/postgis_comments.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/rtpostgis.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/raster_comments.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/topology.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/topology_comments.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/legacy.sql
sudo -u postgres psql -d template_postgis -f /usr/share/postgresql/9.1/contrib/postgis-2.0/legacy_gist.sql
cd ~/sources
git clone git://
git clone git://
cd dstk
sudo gem install bundler
sudo bundle install
cd ~/sources/dstkdata
If you want to save disk space and don't need geo-statistics, you can skip everything
up until the comment indicating the end of the geostats loading.
createdb -U postgres -T template_postgis statistics
tar xzf statistics/gl_gpwfe_pdens_15_bil_25.tar.gz
export PATH=$PATH:/usr/lib/postgresql/9.1/bin/
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I gl_gpwfe_pdens_15_bil_25/glds15ag.bil public.population_density | psql -U postgres -d statistics
rm -rf gl_gpwfe_pdens_15_bil_25
unzip statistics/
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I Tiff/glc2000_v1_1.tif public.land_cover | psql -U postgres -d statistics
rm -rf Tiff
sudo mkdir /mnt/data
sudo chown pjm /mnt/data
cd /mnt/data
The zip files are here:, or here or here password = ThanksCSI!
sudo curl -O ""
I got the TIF files from here instead!
sudo curl -O ""
unrar SRTM_NE_250m_TIF.rar
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 SRTM_NE_250m.tif public.elevation | psql -U postgres -d statistics
rm -rf SRTM_NE_250m*
curl -O ""
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -a SRTM_W_250m.tif public.elevation | psql -U postgres -d statistics
rm -rf unzip SRTM_W_250m*
curl -O ""
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -a -I SRTM_SE_250m.tif public.elevation | psql -U postgres -d statistics
rm -rf SRTM_SE_250m*
curl -O ""
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_1.bil public.mean_temperature_01 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_2.bil public.mean_temperature_02 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_3.bil public.mean_temperature_03 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_4.bil public.mean_temperature_04 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_5.bil public.mean_temperature_05 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_6.bil public.mean_temperature_06 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_7.bil public.mean_temperature_07 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_8.bil public.mean_temperature_08 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_9.bil public.mean_temperature_09 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_10.bil public.mean_temperature_10 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_11.bil public.mean_temperature_11 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I tmean_12.bil public.mean_temperature_12 | psql -U postgres -d statistics
rm -rf tmean_*
curl -O ""
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_1.bil public.precipitation_01 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_2.bil public.precipitation_02 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_3.bil public.precipitation_03 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_4.bil public.precipitation_04 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_5.bil public.precipitation_05 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_6.bil public.precipitation_06 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_7.bil public.precipitation_07 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_8.bil public.precipitation_08 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_9.bil public.precipitation_09 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_10.bil public.precipitation_10 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_11.bil public.precipitation_11 | psql -U postgres -d statistics
/usr/lib/postgresql/9.1/bin/raster2pgsql -s 4236 -t 32x32 -I prec_12.bil public.precipitation_12 | psql -U postgres -d statistics
rm -rf prec_*
unzip /home/pjm/sources/dstkdata/statistics/ -d .
for f in .tif; do raster2pgsql -s 4236 -t 32x32 -I $f
basename $f .tif
| psql -U postgres -d statistics; donerm -rf us
rm -rf metadata
This is the end of the geostats loading, continue from here if you decide to skip that part.
sudo gem install passenger
sudo passenger-install-apache2-module
You'll need to update the version number below to match whichever actual passenger version was installed
This is what the build said:
LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-5.0.18/buildout/apache2/
PassengerRoot /var/lib/gems/1.8/gems/passenger-5.0.18
PassengerDefaultRuby /usr/bin/ruby1.8
I changed the passenger version in the lines below to match what was found from the lines above:
sudo bash -c 'echo "LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-5.0.18/buildout/apache2/" > /etc/apache2/mods-enabled/passenger.load'
sudo bash -c 'echo "PassengerRoot /var/lib/gems/1.8/gems/passenger-5.0.18" > /etc/apache2/mods-enabled/passenger.conf'
sudo bash -c 'echo "PassengerRuby /usr/bin/ruby1.8" >> /etc/apache2/mods-enabled/passenger.conf'
sudo bash -c 'echo "PassengerMaxPoolSize 3" >> /etc/apache2/mods-enabled/passenger.conf'
sudo sed -i "s/MaxRequestsPerChild[ \t][ \t][0-9][0-9]/MaxRequestsPerChild 20/" /etc/apache2/apache2.conf
I needed to change the DocumentRoot to match the actual location where the data was installed. In my case the sources directory was /home/pjm/sources instead of /home/ubuntu/sources.
Ideally there should have been a new user called ubuntu but I didnt know about this until I was too far into the process.
sudo bash -c 'echo "
<VirtualHost *:8000>
DocumentRoot /home/pjm/sources/dstk/public
RewriteEngine On
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.)$$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.)$$1 [R=301,L]
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews
" > /etc/apache2/sites-enabled/000-default'
sudo ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load
sudo /etc/init.d/apache2 restart
sudo gem install postgres -v ''
cd ~/sources/dstk
cd ~/sources
mkdir maxmind
cd maxmind
wget ""
gunzip GeoLiteCity.dat.gz
wget ""
tar xzvf GeoIP.tar.gz
cd GeoIP-1.4.8/
libtoolize -f
sudo make install
cd ..
svn checkout svn:// net-geoip
cd net-geoip/
ruby ext/extconf.rb
sudo make install
cd ~/sources
tar -xvzf libiconv-1.11.tar.gz
cd libiconv-1.11
./configure --prefix=/usr/local/libiconv
sudo make install
sudo ln -s /usr/local/libiconv/lib/ /usr/lib/
createdb -U postgres -T template_postgis reversegeo
cd ~/sources
git clone git://
cd osm2pgsql/
sed -i 's/version = BZ2_bzlibVersion();//' configure
sed -i 's/version = zlibVersion();//' configure
sudo make install
cd ..
osm2pgsql -U postgres -d reversegeo -p world_countries -S osm2pgsql/styles/ dstkdata/world_countries.osm -l
osm2pgsql -U postgres -d reversegeo -p admin_areas -S osm2pgsql/styles/ dstkdata/admin_areas.osm -l
osm2pgsql -U postgres -d reversegeo -p neighborhoods -S osm2pgsql/styles/ dstkdata/neighborhoods.osm -l
The above commands take several hours to complete
I started the next set of commands in a new window...
cd ~/sources
git clone git://
cd boilerpipe/boilerpipe-core/
cd src
javac -cp ../dist/boilerpipe-1.1-dev.jar
cd ~/sources/dstk/
psql -U postgres -d reversegeo -f sql/loadukpostcodes.sql
osm2pgsql -U postgres -d reversegeo -p uk_osm -S ../osm2pgsql/ ../dstkdata/uk_osm.osm.bz2 -l
psql -U postgres -d reversegeo -f sql/buildukindexes.sql
cd ~/sources
git clone git://
cd geocoder
sudo make install
Build the latest Tiger/Line data for US address lookups
cd /mnt/data
mkdir tigerdata
cd tigerdata
mirror --parallel=5 .
mirror --parallel=5 .
cd ../ADDR
mirror --parallel=5 .
cd ~/sources/geocoder/build/
mkdir ../../geocoderdata/
./tiger_import ../../geocoderdata/geocoder2012.db /mnt/data/tigerdata/
Completed to here
cd ~/sources
git clone git://
cd sqlite3-ruby
ruby setup.rb config
ruby setup.rb setup
sudo ruby setup.rb install
cd ~/sources/geocoder
bin/rebuild_metaphones ../geocoderdata/geocoder2012.db
chmod +x build/build_indexes
build/build_indexes ../geocoderdata/geocoder2012.db
rm -rf /mnt/data/tigerdata
createdb -U postgres names
cd /mnt/data
curl -O ""
dos2unix yob*.txt
~/sources/dstk/dataconversion/analyzebabynames.rb . > babynames.csv
psql -U postgres -d names -f ~/sources/dstk/sql/loadnames.sql
Fix for postgres crashes,
sudo sed -i "s/shared_buffers = [0-9A-Za-z]*/shared_buffers = 512MB/" /etc/postgresql/9.1/main/postgresql.conf
sudo sysctl -w kernel.shmmax=576798720
sudo bash -c 'echo "kernel.shmmax=576798720" >> /etc/sysctl.conf'
sudo bash -c 'echo "vm.overcommit_memory=2" >> /etc/sysctl.conf'
sudo sed -i "s/max_connections = 100/max_connections = 200/" /etc/postgresql/9.1/main/postgresql.conf
sudo /etc/init.d/postgresql restart
Remove files not needed at runtime
rm -rf /mnt/data/*
rm -rf ~/sources/libiconv-1.11.tar.gz
rm -rf ~/sources/postgis-2.0.3.tar.gz
cd ~/sources/
mkdir dstkdata_runtime
mv dstkdata/ethnicityofsurnames.csv dstkdata_runtime/
mv dstkdata/GeoLiteCity.dat dstkdata_runtime/
rm -rf dstkdata
mv dstkdata_runtime dstkdata
Up to this point, you'll have a 0.50 version of the toolkit.
The following will upgrade you to a 0.51 version
cd ~/sources/dstk
git pull origin master
I found that the toolkit wass already uptodate
TwoFishes geocoder
cd ~/sources
mkdir twofishes
cd twofishes
mkdir bin
curl "" > bin/twofishes.jar
mkdir data
The source link above is obsolete
curl "" > data/
This one might work... its unknown what was in versus
curl "" > data/
The ~/sources/dstk/ must be edited to point to the new directory.
java -Xmx1500M -jar /home/ubuntu/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/ubuntu/sources/twofishes/data/latest/
to this
java -Xmx1500M -jar /home/pjm/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/pjm/sources/twofishes/data/2015-03-05-20-05-30.753698/
The entire ~/sources/dstk/ directory should be check to see if there is any reference to /home/ubuntu and renamed to point to /home/pjm instead
I looked through the dstk and found several instances like this:
cd ~/sources/dstk
grep '/home/ubuntu' *'/home/ubuntu/sources/dstk/dstk_server.rb', {
twofishes.conf:exec start-stop-daemon --start -c root --exec /home/ubuntu/sources/dstk/ -Xmx1500M -jar /home/ubuntu/sources/twofishes/bin/twofishes.jar --hfile_basepath /home/ubuntu/sources/twofishes/data/latest/
cd data
sudo cp ~/sources/dstk/twofishes.conf /etc/init/twofishes.conf
sudo service twofishes start
Here is what the VirtualHost field looks like already
sudo bash -c 'echo "
<VirtualHost *:8000>
DocumentRoot /home/pjm/sources/dstk/public
RewriteEngine On
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.*)$$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.*)$$1 [R=301,L]
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews
" > /etc/apache2/sites-enabled/000-default'
This will be changed to this now:
sudo bash -c 'echo "
<VirtualHost :8000>
DocumentRoot /home/pjm/sources/dstk/public
RewriteEngine On
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.)$$1 [R=301,L]
RewriteCond %{HTTP_HOST} ^$ [NC]
RewriteRule ^(.)$$1 [R=301,L]
# We have an internal TwoFishes server running on port 8081, so redirect
# requests that look like they belong to its API
ProxyPass /twofishes http://localhost:8081
<Directory /home/pjm/sources/dstk/public>
AllowOverride all
Options -MultiViews
Header set Access-Control-Allow-Origin ""
Header set Cache-Control "max-age=86400"
" > /etc/apache2/sites-enabled/000-default'
sudo ln -s /etc/apache2/mods-available/rewrite.load /etc/apache2/mods-enabled/rewrite.load
sudo ln -s /etc/apache2/mods-available/proxy.load /etc/apache2/mods-enabled/proxy.load
sudo ln -s /etc/apache2/mods-available/proxy_http.load /etc/apache2/mods-enabled/proxy_http.load
sudo ln -s /etc/apache2/mods-available/headers.load /etc/apache2/mods-enabled/headers.load
sudo /etc/init.d/apache2 restart
I now go to and I get the datasciencetoolkit webpage along with all the tools!! Nice!!
The text was updated successfully, but these errors were encountered: