Skip to content

quyvx/Alchemy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 

Repository files navigation

Alchemy lab by Quy Vu

This is where crazy ideas are experimented. Lots of them didn't work, but some turned out to be interesting!

About me:

  • Machine learning, deep learning with Python, Matlab
  • Extract data with SQL
  • Cloud computing in the Azure ecosystem
  • Present result with PowerBI, Data engineering with PySpark.
  • Version control with Git

Currently part of the Data and Analytics team at Mott MacDonald in London. Previously worked at PwC and KPMG in Vietnam as consultant. MSc in data science at City, University of London.

Play Dota 2 and basketball in my free time. Worship dogs.

Computer vision

Face and character detection and recognition

  • Processed group and individual photos, then extracted features for 3 ML algorithms and
  • Performed transfer learning (VGG-Net) to achive 99.25% accuracy.
  • Wrote a program to detect and recognise faces in pictures and video.
  • Also detect and recognise any number shown in the picture

Report

Optical character recognition

  • Highlighted the main difference of CNN and MLP and compare their performances in OCR task.
  • Messed up traning data labels to see how can the network learn - THEY CAN!!! Turned out to be one of the active research area!

Report

Reinforcement learning

  • Trained an agent to solve the Cliff Walking problem using Q-learning and SARSA
  • Experimented different learning parameters such as exploration factor (epsilon), decay factor (lambda), learning policy, learning rate, discount factor.
  • Concluded that the agent trained with only Q-learning is quite dumb ...

Report | Repo

Imbalance learning

Credit card fraud detection

  • Used PCA to identify important predictors.
  • My first machine learning project at City University.
  • As the main goal was to to understand more about the data, I selected logistic regression and decision tree. Obiviously could get better performance with more sophisticated models.
  • Earned me a final interview at GoldmanSachs

Notebook | Repo

Big data

Natural Language Processing

Spam detection with Logistic Regression, Naive Bayes, Support Vector Machines + PySpark

  • Experimented 3 ML algorithms to classify spam messages
  • The bigger the hash vector, the better prediction, since differents words are less likely to be assigned to the same position. However, there was no improvement as the vector size exceeds 3,000: The topic may not be diversed.
  • Normalizing samples to unit L1 or L2 norm limits SVM's accuracy to ~84%. Not normalising boosted SVM's accuracy to 90%. This can be regarded as an alternative to tunning SVM's kernel function.

Notebook

Sentiment classification of Amazon Reviews with PySpark

  • Used Logistic Regression and Naive Bayes to classify 4 millions Amazon Reviews.
  • Set up a PySpark pipeline to process data.
  • TF-IDF gave similar result to word2vec, but took less time to run.
  • TF-IDF length of 10,000 gave AUC score of 0.92, how about 100,000? No better. It turns out that 10,000 is enough for one single topic (which is product review)

Notebook | Repo | Poster

Under construction ...

Contact

LinkedIn

Email: [email protected]!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published