-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
showing error #11
Comments
Hey @harshalpatilnmu, Try these:
Also there are pre-trained models you can download and try out, see here: |
My dataset is different so first I need to train the model then only I can run it..so how I can train my model. |
Make sure your console working directory is D:\chatbot\seq2seq-chatbot-master\seq2seq-chatbot. You should be able to see the hparams.json file directly in this folder. If you are unsure of the working directory, run it from an ipython console and set it manually: ipython
|
I set directory even it shows error.. (aiml) D:\chatbot\seq2seq-chatbot-master\seq2seq-chatbot>python train.py --datasetdir=datasets\chatbot_dataset (aiml) D:\chatbot\seq2seq-chatbot-master\seq2seq-chatbot> |
The error message is saying: json.decoder.JSONDecodeError: Expecting ',' delimiter: line 57 column 33 Check the hparams.json file to make sure no comma is missing on a line that should have one. If you are not sure, copy and paste the file into the left box here: https://jsoneditoronline.org/ and it will automatically detect formatting errors. I tried this with the committed version in the repository and there are no errors detected. |
(aiml) D:\chatbot\seq2seq-chatbot-master\seq2seq-chatbot>python train.py --datas Reading dataset 'cornell_movie_dialog'... |
Looks like your dataset is probably not formatted the same way as the cornell movie dialog dataset. You will need to implement a reader for your custom dataset: See cornell_dataset_reader.py - this class implements the reader that converts the raw cornell files "movie_lines.txt" and "movie_conversations.txt" into the dict Duplicate this class, rename it and tweak the implementation to work with your own dataset format - all that matters is that the output is the same - Once the new reader is implemented, register an instance of it in the dataset_reader_factory: Alternatively if you don't want to do all of this, modify your dataset so that it follows the same format as the cornell movie dialog dataset. |
I have csv file in which data is formatted as questions and answers so how I can read it dataser_reader_factor.. I used pd.read_csv() function but I stucked in your code..how to use id2line and conversation. in my case my data is ready ..I dont need to split and replace it. could you help me for writing code. """ from dataset_readers.dataset_reader import DatasetReader class CornellDatasetReader(DatasetReader):
|
The base class is expecting the data in the format of a conversational log, such as: It infers question-answer pairs as follows: If you already have your data in this form, unfortunately you will need to present it as a log and let the base class put it back in that form. For now, you can take each question-answer pair from your CSV and do this (pseudo code):
One additional thing - you should set Alternately, If you are willing to share your CSV, I can implement the reader and train it on my Titan V GPU. |
Hi AbrahamSanders, |
@harshalpatilnmu, pull down csv_dataset_reader.py and dataset_reader_factory.py Make sure to save your data as a CSV (I don't know if Pandas will accept .xlsx) Finally, follow the instructions here. Let me know how it goes! Some additional notes on hparam configuration (hparams.json): If you have a basic Q&A dataset, set the hparam inference_hparams/conv_history_length to 0 so that it will treat each question independently while chatting. Also, you can reduce the size of your model if you have a smaller dataset. The default is pretty big - 4 layer encoder/decoder, 1024 cell units per layer. You can choose to train with the sgd or adam optimizers - the default learning rate is good for sgd, but if you use adam then lower it to 0.001. |
Reply: #Reader class for the Cornell movie dialog dataset from os import path class CornellDatasetReader(DatasetReader):
error: Reading dataset 'cornell_movie_dialog'... |
@harshalpatilnmu please follow the directions in my last post. Revert cornell_dataset_reader.py and pull down the new reader as per my post. This should be able to process your CSV - I tested it successfully on the dummy data you sent me. Also, make sure your data is in the directory |
Thanks a lot for giving support, data is trained on dataset properly. I set the hparam inference_hparams/conv_history_length to 0 but it shows repeated answers. when I type question first time it shows correct answer and second time when I pass some information then chatbot will return output as previous output. so how I can avoid them. |
@harshalpatilnmu you're welcome - I'm glad training is working for you now. Here are a few considerations to help resolve your issue:
To use pre-trained embeddings, follow these suggestions: b) if your dataset is mostly technical, proprietary, or domain-specific words (or words in a language other than english): To run it, use the training batch file with nnlm_en embeddings.
I hope I have given you enough info to optimize your model. Let me know how it goes, I am happy to answer any questions! |
File size is 157 KB. |
when I try to train model then it shows an error ..
(aiml) D:\chatbot\seq2seq-chatbot-master\seq2seq-chatbot>python train.py --datas
etdir=datasets\chatbot_dataset
Traceback (most recent call last):
File "train.py", line 14, in
dataset_dir, model_dir, hparams, resume_checkpoint = general_utils.initializ
e_session("train")
File "D:\chatbot\seq2seq-chatbot-master\seq2seq-chatbot\general_utils.py", lin
e 45, in initialize_session
copyfile("hparams.json", os.path.join(model_dir, "hparams.json"))
File "C:\Users\1patilha\AppData\Local\Continuum\anaconda3\envs\aiml\lib\shutil
.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'hparams.json'
The text was updated successfully, but these errors were encountered: