why you just select train dataset (TRAIN_PATH) to bulid vocabulary for coding? #67

hoogang · 2018-05-18T07:22:37Z

print("Creating vocabulary...")
input_iter = create_csv_iter(TRAIN_PATH)
input_iter = (x[0] + " " + x[1] for x in input_iter)
vocab = create_vocab(input_iter, min_frequency=FLAGS.min_word_frequency)
print("Total vocabulary size: {}".format(len(vocab.vocabulary_)))

Create vocabulary.txt file
write_vocabulary(
vocab, os.path.join(FLAGS.output_dir, "vocabulary.txt"))

Save vocab processor
vocab.save(os.path.join(FLAGS.output_dir, "vocab_processor.bin"))

bulid vocabulary should concern all dataset (train-path,valid_path,test_path)

thanks, hope for your answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why you just select train dataset (TRAIN_PATH) to bulid vocabulary for coding? #67

why you just select train dataset (TRAIN_PATH) to bulid vocabulary for coding? #67

hoogang commented May 18, 2018 •

edited

Loading

why you just select train dataset (TRAIN_PATH) to bulid vocabulary for coding? #67

why you just select train dataset (TRAIN_PATH) to bulid vocabulary for coding? #67

Comments

hoogang commented May 18, 2018 • edited Loading

hoogang commented May 18, 2018 •

edited

Loading