"Data-savvy graduate (May 2023) seeking a Data Engineer/ Analyst role to solve the company's biggest business problems"
β’ Scraped 24000+ products under 25+ categories from Walmart with meta data to develop a βWalmart Lensβ.
β’ Eliminated data redundancies and skewness by pre-processing and cleaning.
β’ Analyzed around 25000 image input through CNN model to list all the products (<=10) recognized.
β’ Worked on GCPβs Vertex AI and Google Cloud Vision to generate labels and detect (<=10) texts in an image.
β’ Labeled the products using Amazon Rekognition Custom Label API and lambda functions.
β’ Web App
β’ Performed data wrangling and EDA on real-world data of 84,000+ building units details sold in NYC over a year
β’ Created quantile regression model with area- sale price and applied Recursive Feature Elimination to get top 10 features
β’ Designed a Random Forest Regressor model to compare the results with RFE
β’ Built Neural Network and improved its accuracy with Adam to improve performance.
β’ Developed a script to scrape tweets (10,000+) in real-time
β’ Analyzed tweets data to find out the impression a tweet makes based on keyword and usernames.
β’ Visualized the results to find out the extent of correlation between keywords and user using Seaborn & Matplotlib.
β’ Improved the efficiency of script to scrape tweets related to multiple keywords synchronously.
β’ Performed data cleaning, wrangling, outlier detection and stop word removal for some columns in the data.
β’ Determined relationship between categorical data using Chi Square method and built Random Forest Regressor with different number of estimators. Calculated model score (R-Squared) for each estimator by fitting X and Y.
β’ Formulated Open Interest per minute per strike price for any given stock, NIFTY and Bank NIFTY.
β’ BsScan Transaction Alert on Mail
Note : May/ May not indicate my skill level, it is just a GitHub metric of languages I have in my commits.