Skip to content

Commit

Permalink
Fix to ValueError: insecure string pickle
Browse files Browse the repository at this point in the history
In my machine, existing code caused error ValueError: insecure string pickle. This was caused by the size of words_file.

To fix this, I use C++ implementation of pickle called cPickle.

I hope this helps other students.
  • Loading branch information
jaycode committed Jun 17, 2015
1 parent efa6e32 commit 8670a39
Showing 1 changed file with 9 additions and 4 deletions.
13 changes: 9 additions & 4 deletions tools/email_preprocess.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/python

import pickle
import cPickle
import numpy

from sklearn import cross_validation
Expand Down Expand Up @@ -28,8 +29,13 @@ def preprocess(words_file = "../tools/word_data.pkl", authors_file="../tools/ema

### the words (features) and authors (labels), already largely preprocessed
### this preprocessing will be repeated in the text learning mini-project
word_data = pickle.load( open(words_file, "r"))
authors = pickle.load( open(authors_file, "r") )
authors_file_handler = open(authors_file, "r")
authors = pickle.load(authors_file_handler)
authors_file_handler.close()

words_file_handler = open(words_file, "r")
word_data = cPickle.load(words_file_handler)
words_file_handler.close()

### test_size is the percentage of events assigned to the test set (remainder go into training)
features_train, features_test, labels_train, labels_test = cross_validation.train_test_split(word_data, authors, test_size=0.1, random_state=42)
Expand All @@ -54,6 +60,5 @@ def preprocess(words_file = "../tools/word_data.pkl", authors_file="../tools/ema
### info on the data
print "no. of Chris training emails:", sum(labels_train)
print "no. of Sara training emails:", len(labels_train)-sum(labels_train)



return features_train_transformed, features_test_transformed, labels_train, labels_test

0 comments on commit 8670a39

Please sign in to comment.