Any idea to get the performance to 70% #20

wailoktam · 2016-07-29T04:41:13Z

Hi, I mean without doing something not in the paper dos santos 2016

I am mentioning 70% coz it is what the author of this paper reported on using the LSTM+ attention with the insuranceQA data. I get 40 something like codekansas. Can I be confident in blaming dos santos in faking the result?

codekansas · 2016-08-02T11:13:01Z

I get the sense that it has something to do with finely tuning the hyperparameters. Or maybe they used better pre-trained embeddings... The best result I've gotten so far was around 55% using a generative RNN model plus an embedding layer, although I was hoping it would be better. I would be really interested to see if someone can duplicate their results.

codekansas · 2016-08-02T19:52:29Z

I was looking through some of it yesterday and realized my GESD implementation was broken. The fixed one is in the repo now, try with that. It may give better results, I'm not sure.

eshijia · 2016-08-04T07:05:08Z

hi, @codekansas
I have trained with you latest code in the repo. The result seems not better than before:

Epoch 49 :: 2016-08-04 00:35:19 :: Train on 16686 samples, validate on 1854 samples
Epoch 1/1
16686/16686 [==============================] - 1071s - loss: 6.2260e-04 - val_loss: 9.1324e-04
Best: Loss = 0.000833536147339, Epoch = 36
2016-08-04 00:53:19 :: ----- test1 -----
[====================]Top-1 Precision: 0.316667
MRR: 0.469865
2016-08-04 01:05:08 :: ----- test2 -----
[====================]Top-1 Precision: 0.327222
MRR: 0.478177
2016-08-04 01:16:38 :: ----- dev -----
[====================]Top-1 Precision: 0.335000
MRR: 0.491637

wailoktam · 2016-08-04T08:17:33Z

hi, @codekansas @eshijia
I am trying other similarity to see how things will go. I have also changed the training part a bit to make sure the bad answers are bad answers when it randomly draws an answer from the answer pool. I will keep you guys post

codekansas · 2016-08-05T07:02:05Z

I trained the attention model and printed out some predicted and expected answers, then dumped them in this gist. You guys can decide for yourself. I'm more or less ready to change datasets. The top-1 precision was still much worse than the basic embedding model.

eshijia · 2016-08-05T09:36:21Z

There is a theano version for this task (and the paper). The results are identical with the paper. For the ConvolutionModel of keras_models.py, I have tried almost same hyper-parameter just as the theano version do, but it doesn't give better results.

I haven't read the theano code carefully, but i believe the implementation is different from ours. When I have enough time, I will try to hack the code to find out whether I can improve it.

wailoktam · 2016-08-06T07:29:38Z

Just to report back that I have no luck with my try with cosine similarity

wailoktam · 2016-08-09T08:58:06Z

Hi, I suggest we try using the V2 data. There is a choice of pool size. I think they may get the 70% by using the smallest pool.

codekansas · 2016-08-09T18:02:25Z

I noticed the two scripts run for 2000000 (CNN) and 20000000 (LSTM+CNN) batches, I think it must have taken a really long time to train. The results I included were after training for only about 30000 batches.

wailoktam · 2016-08-10T00:42:59Z

20000000! That does not look realistic for departments without access to super computer? It takes me a day to run 100 epoch + 20 batch size. I need 10000 days to get to 70%...

eshijia · 2016-08-10T02:52:53Z

I have asked the author of the theano version. He told me that it took about 1 day to run for 20000000 epochs with his Tesla GPU. But i don't think it really needs 2000000 epochs. In addition, he used character level embeddings.

codekansas · 2016-08-10T07:10:02Z

Wow, I did not realize the Teslas are so fast... I'll just run it for a while on my 980ti I suppose. Character level embeddings though? It looks like regular word embeddings here

I would really like to replicate their result haha

wailoktam · 2016-08-11T06:11:08Z

Hi, will the code run on tensorflow backend in its current state? I am asking this question because I think I need to run it on multiple gpu to improve training speed. This thread says that Keras would support multiple GPU when running with tensorflow backend but not theano backend. If it cannot run on tensorflow backend at the moment, how can I change hopefully a couple of lines to get it run on tensorflow?

codekansas · 2016-08-11T13:54:59Z

I think the performance really depends on how long you run it. I ran a CNN-LSTM model for ~700 epochs and got a precision of 0.52, going to run it longer to see if it improves.

conf = {
    'question_len': 150,
    'answer_len': 150,
    'n_words': 22353, # len(vocabulary) + 1
    'margin': 0.05,

    'training_params': {
        'print_answers': False,
        'save_every': 1,
        'batch_size': 100,
        'nb_epoch': 3000,
        'validation_split': 0.1,
        'optimizer': SGD(lr=0.05), # Adam(clipnorm=1e-2),
    },

    'model_params': {
        'n_embed_dims': 100,
        'n_hidden': 200,

        # convolution
        'nb_filters': 500,  # * 4
        'conv_activation': 'tanh',

        # recurrent
        'n_lstm_dims': 141,  # * 2

        'initial_embed_weights': np.load('models/word2vec_100_dim.h5'),
        'similarity_dropout': 0.25,
    },

    'similarity_params': {
        'mode': 'gesd',
        'gamma': 1,
        'c': 1,
        'd': 2,
    }
}

evaluator = Evaluator(conf)

##### Define model ######
model = CNNLSTM(conf)
optimizer = conf.get('training_params', dict()).get('optimizer', 'rmsprop')
model.compile(optimizer=optimizer)

# train the model
best_loss = evaluator.train(model)
evaluator.load_epoch(model, best_loss['epoch'])
evaluator.get_score(model, evaluate_all=True)

class CNNLSTM(LanguageModel):
    def build(self):
        question = self.question
        answer = self.get_answer()

        # add embedding layers
        weights = self.model_params.get('initial_embed_weights', None)
        weights = weights if weights is None else [weights]
        embedding = Embedding(input_dim=self.config['n_words'],
                              output_dim=self.model_params.get('n_embed_dims', 100),
                              weights=weights,
                              # mask_zero=True)
                              mask_zero=False)
        question_embedding = embedding(question)
        answer_embedding = embedding(answer)

        f_rnn = LSTM(self.model_params.get('n_lstm_dims', 141), return_sequences=True, consume_less='mem')
        b_rnn = LSTM(self.model_params.get('n_lstm_dims', 141), return_sequences=True, consume_less='mem')

        qf_rnn = f_rnn(question_embedding)
        qb_rnn = b_rnn(question_embedding)
        question_pool = merge([qf_rnn, qb_rnn], mode='concat', concat_axis=-1)

        af_rnn = f_rnn(answer_embedding)
        ab_rnn = b_rnn(answer_embedding)
        answer_pool = merge([af_rnn, ab_rnn], mode='concat', concat_axis=-1)

        # cnn
        cnns = [Convolution1D(filter_length=filter_length,
                          nb_filter=self.model_params.get('nb_filters', 500),
                          activation=self.model_params.get('conv_activation', 'tanh'),
                          # W_regularizer=regularizers.l1(1e-4),
                          # activity_regularizer=regularizers.activity_l1(1e-4),
                          border_mode='same') for filter_length in [1, 2, 3, 5]]
        question_cnn = merge([cnn(question_pool) for cnn in cnns], mode='concat')
        answer_cnn = merge([cnn(answer_pool) for cnn in cnns], mode='concat')

        maxpool = Lambda(lambda x: K.max(x, axis=1, keepdims=False), output_shape=lambda x: (x[0], x[2]))
        question_pool = maxpool(question_cnn)
        answer_pool = maxpool(answer_cnn)

        return question_pool, answer_pool

codekansas · 2016-08-14T08:47:58Z

Ended up with

Best: Loss = 0.001460216869, Epoch = 879
2016-08-14 05:58:27 :: ----- test1 -----
[====================]Top-1 Precision: 0.564444
MRR: 0.680506
2016-08-14 06:17:06 :: ----- test2 -----
[====================]Top-1 Precision: 0.543333
MRR: 0.661070
2016-08-14 06:35:26 :: ----- dev -----
[====================]Top-1 Precision: 0.573000
MRR: 0.685989

after training for about 4-5 days on my 980ti. I can see how after enough iterations you could get up to ~60-70%, but my GPU would take way too long...

eshijia · 2016-08-15T02:14:41Z

Sounds great! I would like to follow your training progress. The duration of one epoch with the CNNLSTM model is 490s. It will take about 17 days to complete 3000 epochs. My GPU device is Tesla K20c. By the way, I think another important thing is to let the code fit the latest keras version :)

wailoktam · 2016-08-16T05:45:02Z

@eshijia
Hi, is 3000 epoch enough? I think you mention 20000000 batches. I assume you did not touch the default batch size 128? 3000 x 128 is 38400. Or do I get the idea of batches wrong?

codekansas · 2016-08-16T07:31:26Z

17 days seems slow for that GPU? I wonder if it is slow for some reason (maybe it's running on the CPU instead of the GPU?) But 3000 epochs * 16686 samples per epoch is 50,058,000 samples, where as the other script it was 20,000,000 * 128 or 2,560,000,000 samples. On my GPU (980ti) it will take ~6.4 days to train 3000 epochs, it would take nearly a year to train as many samples as their model used on my GPU.

Also, I found a big difference in training while using different optimizers. I think the Adadelta optimizer works well, RMSprop was overfitting a lot.

eshijia · 2016-08-16T08:19:41Z

It is really running on GPU. There are four GPU devices (K20c) in my server, and each of them always runs different tasks at the same time. I can see that the GPU-Util of the device used to run this task is 96% with the command nvidia-smi. I don't understand why it is running slow.

eshijia · 2016-08-17T02:17:45Z

I think my Tesla GPU is really old. The configuration is not up to 980ti.

eshijia · 2016-08-23T06:30:33Z

@wailoktam Could you share how you change the training part to make sure the bad answers are really bad answers?

wailoktam · 2016-08-23T06:41:52Z

My pleasure.

        save_every = self.params.get('save_every', None)
        batch_size = self.params.get('batch_size', 128)
        nb_epoch = self.params.get('nb_epoch', 10)
        split = self.params.get('validation_split', 0)

training_set = self.load('train')

questions = list()
good_answers = list()

for q in training_set:
    questions += [q['question']] * len(q['answers'])
    good_answers += [self.answers[i] for i in q['answers']]

questions = self.padq(questions)
good_answers = self.pada(good_answers)

val_loss = {'loss': 1., 'epoch': 0}

for i in range(1, nb_epoch):

    bad_answers = self.pada(random.sample(self.answers.values(), len(good_answers)))
    fixed_bads = np.empty((0, 100), int)
    qCounter = 0

    for (gs, bs) in zip(good_answers, bad_answers):
        print ("bad answer shape")
        print (bs.shape)
        print("fixed bad shape")
        print (fixed_bads.shape)


        if not (gs == bs).all():

            fixed_bads = np.append(fixed_bads, [bs], axis=0)

        else:

            fixed_bads = np.append(fixed_bads, self.pada(random.sample(self.answers.values(), 1)), axis=0)

        qCounter += 1



    print('Epoch %d :: ' % i, end='')
    bad_answers = fixed_bads
    self.print_time()





    print('Epoch %d :: ' % i, end='')
    self.print_time()
    hist = model.fit([questions, good_answers, bad_answers], nb_epoch=1, batch_size=batch_size,
                     validation_split=split)

    if hist.history['val_loss'][0] < val_loss['loss']:
        val_loss = {'loss': hist.history['val_loss'][0], 'epoch': i}
    print('Best: Loss = {}, Epoch = {}'.format(val_loss['loss'], val_loss['epoch']))

    if save_every is not None and i % save_every == 0:
        self.save_epoch(model, i)

return val_loss

wailoktam · 2016-08-25T02:39:03Z

I think I can also share the version2 insurance data and Japanese wiki data, which I have structured in a way to be used for this great work of codekansas. However, I am running them without pretrained word2vec weights. The reason is that the program complains about the different size of vocabularies. As you guys can tell, without the pretrained weights, it will even take longer time to get the 70% claimed.

eshijia · 2016-08-25T05:48:37Z

I have tried to train about 3000 epochs for the CNNLSTM model, and the loss is stable at about 0.0013. The test results are just same as @codekansas mentioned above.

2016-08-25 02:48:36 :: ----- test1 -----
[====================]Top-1 Precision: 0.571667
MRR: 0.684311
2016-08-25 03:25:51 :: ----- test2 -----
[====================]Top-1 Precision: 0.543333
MRR: 0.660048
2016-08-25 04:01:48 :: ----- dev -----
[====================]Top-1 Precision: 0.564000
MRR: 0.682626

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any idea to get the performance to 70% #20

Any idea to get the performance to 70% #20

wailoktam commented Jul 29, 2016

codekansas commented Aug 2, 2016

codekansas commented Aug 2, 2016

eshijia commented Aug 4, 2016

wailoktam commented Aug 4, 2016

codekansas commented Aug 5, 2016

eshijia commented Aug 5, 2016

wailoktam commented Aug 6, 2016

wailoktam commented Aug 9, 2016

codekansas commented Aug 9, 2016

wailoktam commented Aug 10, 2016

eshijia commented Aug 10, 2016 •

edited

Loading

codekansas commented Aug 10, 2016 •

edited

Loading

wailoktam commented Aug 11, 2016

codekansas commented Aug 11, 2016

codekansas commented Aug 14, 2016

eshijia commented Aug 15, 2016 •

edited

Loading

wailoktam commented Aug 16, 2016

codekansas commented Aug 16, 2016

eshijia commented Aug 16, 2016

eshijia commented Aug 17, 2016

eshijia commented Aug 23, 2016

wailoktam commented Aug 23, 2016 •

edited

Loading

wailoktam commented Aug 25, 2016

eshijia commented Aug 25, 2016

Any idea to get the performance to 70% #20

Any idea to get the performance to 70% #20

Comments

wailoktam commented Jul 29, 2016

codekansas commented Aug 2, 2016

codekansas commented Aug 2, 2016

eshijia commented Aug 4, 2016

wailoktam commented Aug 4, 2016

codekansas commented Aug 5, 2016

eshijia commented Aug 5, 2016

wailoktam commented Aug 6, 2016

wailoktam commented Aug 9, 2016

codekansas commented Aug 9, 2016

wailoktam commented Aug 10, 2016

eshijia commented Aug 10, 2016 • edited Loading

codekansas commented Aug 10, 2016 • edited Loading

wailoktam commented Aug 11, 2016

codekansas commented Aug 11, 2016

codekansas commented Aug 14, 2016

eshijia commented Aug 15, 2016 • edited Loading

wailoktam commented Aug 16, 2016

codekansas commented Aug 16, 2016

eshijia commented Aug 16, 2016

eshijia commented Aug 17, 2016

eshijia commented Aug 23, 2016

wailoktam commented Aug 23, 2016 • edited Loading

wailoktam commented Aug 25, 2016

eshijia commented Aug 25, 2016

eshijia commented Aug 10, 2016 •

edited

Loading

codekansas commented Aug 10, 2016 •

edited

Loading

eshijia commented Aug 15, 2016 •

edited

Loading

wailoktam commented Aug 23, 2016 •

edited

Loading