Skip to content

NLP LSTM model to predict python codes (Text prediction) (Tokenized special characters)

License

Notifications You must be signed in to change notification settings

enockjamin01/autocode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Auto Code - NLP Code prediction

Auto code is a NLP model that can predict python code snippets with the provided code to it. It is a LSTM (Long Short Term Memory model) with word Embedding.

It is trained with 606 rows of code snippets on a Macbook M1 Pro with a Total embedding word size of 237. The major thing I have tried in the project is to make the tokenizer of TensorFlow detect Special characters.

Special characters are converted to texts and then are tokenized and is reversed back to symbols after prediction.

Training for other languages

This model can be trained for other languages just by creating code snippets of the respective language CSV file like dataset.csv file

Modules to install

python3 -m pip install TensorFlow

pip install pandas

pip install numpy

pip install matplotlib

Frameworks and Languages used

  • Tensorflow
  • Pandas
  • Numpy
  • Matlab

Training with more data:

To train the model with more data just add more code snippets in each cell of dataset.csv or add more data as python list in dataprocessing.ipynb file and run code to append data to csv file

Splitting of data

  • Train = .8
  • Validiation = .2

Accuracy and Loss plot

  • Accuracy

Accuracy

  • Loss

Loss

License

MIT