Skip to content

Archives your Piwik data to Amazon S3 (so you can shrink your MySQL database)

Notifications You must be signed in to change notification settings

keplarllp/piwik-data-archiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Piwik Data Archiver

Overview

The Piwik Data Archiver is a command-line tool which dumps your Piwik piwik log data to .csv files and then uploads these files to [Amazon S3] s3 ready for you to analyze using SnowPlow snowplow. SnowPik is written in Scala.

Please note: SnowPik is functionally complete except for the period argument, which is yet to be implemented. SnowPik currently extracts all Piwik log data right up to the present moment.

Health Warning

For performance reasons, SnowPik talks directly to your [Piwik MySQL database] schema - if you want a version which uses the Piwik API, please vote for [this GitHub issue] issue1.

While all of its database operations are non-destructive, we strongly recommend creating a dedicated user account in your Piwik MySQL database for SnowPik, and giving this SnowPik user only the bare-minimum permissions.

Note that you use this software at your own risk - Orderly Ltd takes no responsibility for any loss of data incurred through the use of it.

Setup

1. Before you start

To make full use of SnowPik, you will need:

  • A running Piwik installation
  • Access to your your Piwik MySQL database from a computer with the Java runtime
  • An Amazon S3 account

2. Download and install

You can download the latest build of SnowPik from this GitHub repository. Right-click on this [download link] download and click Save As..., or alternatively:

$ sudo mkdir /opt/snowpik
$ cd /opt/snowpik
$ wget https://github.com/downloads/snowplow/snowplow-piwik-connector/snowpik-0.1.zip
$ unzip snowpik-0.1.zip

3. Setup MySQL access

Setup a dedicated MySQL user account for SnowPik, and grant the user SELECT (i.e. read-only) permissions on the five tables that it needs to read:

-- Create the SnowPik user
CREATE USER 'snowpik'@'localhost' IDENTIFIED BY '<password>';

-- Set the SnowPik user's permissions
GRANT SELECT ON `piwik`.`piwik\_log\_action` TO 'snowpik'@'localhost' WITH GRANT OPTION;
GRANT SELECT ON `piwik`.`piwik\_log\_visit` TO 'snowpik'@'localhost' WITH GRANT OPTION;
GRANT SELECT ON `piwik`.`piwik\_log\_link\_visit\_action` TO 'snowpik'@'localhost' WITH GRANT OPTION;
GRANT SELECT ON `piwik`.`piwik\_log\_conversion` TO 'snowpik'@'localhost' WITH GRANT OPTION;
GRANT SELECT ON `piwik`.`piwik\_log\_conversion\_item` TO 'snowpik'@'localhost' WITH GRANT OPTION;

4. Setup your Configuration File

Next you need to create a configuration file, setting both your Piwik MySQL and Amazon S3 connection details. You can find a template configuration file within the .zip file.

Here is a configuration file template, please update this to your requirements:

#################################
# SnowPik Default Configuration #
#################################

snowpik {
  db {
    username: "PIWIK DB USERNAME"
    password: "PIWIK DB PASSWORD"
    prefix:   "PIWIK DB TABLE PREFIX EG piwik_"
    database: "PIWIK DB NAME"
    server:   "PIWIK DB SERVER EG localhost"
    port:     "PIWIK DB PORT EG 3306"
  }
  export {
    folder:   "LOCAL FOLDER TO STORE LOGS"
  }
  upload {
    key:      "AWS KEY ID"
    secret:   "AWS SECRET ACCESS KEY"
    bucket:   "AWS S3 BUCKET NAME"
  }
}

5. Run against your historic Piwik data

You can run SnowPik against your Piwik data collected to-date using this command:

./snowpik.sh --config snowpik.conf --period historic

This command processes all of your Piwik data up until the end of yesterday (based on your host computer's clock). Today's data is ignored because the log files for today will be as yet incomplete.

6. Setup a daily job

Once you have uploaded your historical Piwik data to S3, you can setup a daily cronjob to export each day's new Piwik log file data to S3. To do this:

$ sudo vi /etc/crontab

And add the line:

0 5 * * *   root  /opt/snowpik/cronic /opt/snowpik/snowpik.sh --period YESTERDAY

Assuming that you have SnowPik installed in /opt/snowpik, this will run SnowPik every day at 5am, transferring the data Piwik collected yesterday into Amazon S3. cronic is a third-party wrapper script to improve cron's email handling; it's bundled with SnowPik.

Full Usage Instructions

The full capabilities of SnowPik are as per per the command-line usage:

OPTIONS
-c filename
--config filename  Configuration file. Defaults to "resources/default.conf"
                   (within .jar) if not set
-n
--noupload         Flags that the generated .csv files should not be uploaded
                   to S3
-p time
--period time      Time period of data to extract. Either "yesterday" or
                   "historic". NOT YET IMPLEMENTED
-s id
--site id          Piwik site id to extract data for. Defaults to 1    

Please note that the period argument is yet to be implemented. SnowPik currently extracts all Piwik log data right up to the present moment.

Copyright and License

SnowPik, the SnowPlow-Piwik Connector is copyright 2012 Orderly Ltd.

Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Archives your Piwik data to Amazon S3 (so you can shrink your MySQL database)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages