Spring 2020
Life events prediction using social media data
- The goal of this project is to understand people’s expression of life events through social media platforms.
- I am developing machine learning models and drawing theory-driven insights from the data
- I also worked on improving the various ML models that were developed by improving the feature set.
Oscar Winner Prediction Algorithm (CS 4641 Group Project)
- The goal of the project was take movie-related data such as genre, box office earnings, length of movie, and month released to predict whether a movie would win an oscar
- We trained the data on models such as Linear Regression, SVM, Naive Bayes, and Neural Networks to predict Oscar winners with 67% accuracy
- We concluded that while our algorithm predicts with better than 50% accuracy, it is still not a very effective algorithm
- My role in this group project was to develop the deep neural network using TensorFlow
Song Lyrics Generation using LSTM Network (Ongoing Personal Project)
- The goal of this project is to train an LSTM network on a dataset of song lyrics in order to output its own set of lyrics
Glassdoor Project (SocWeB Lab Project)
- The goal of this project is to predict companies’ performance using Glassdoor posts from employees
- My role in this project was to preprocess the Glassdoor post dataset using NLTK and pandas and training various scikit-learn ML models such as SVM, Naive Bayes, Linear Regression
Fall 2019
Life Events Project (SocWeB Lab Project)
- The goal of this project is to use social media data to predict the presence of a major life event in someone’s life, such as death of a loved one, marriage, or moving to a new house
- My role in this project was to annotate the datasets if a life event was present based on the content of the social media post
Song Popularity Prediction Algorithm Using Song Lyrics (Personal Projects)
- The goal of this project was to predict what place on the Billboard Hot 100 chart a song would be based on the lyrics of the song
- I utilized the Genius API along with the Billboard API to collect a dataset from 1980-present of all the songs that were on the Hot 100 chart each month
- I trained a neural network that correctly predicted the placement of the song with only 40% error and a Naive Bayes model that predicted with 45% error
Summer 2019
Analyzing the Impact of Airballs in Basketball using SportVU (UPenn Project)
- The goal of this project was to see if there was a correlation between an occurrence of an airball to other game events such as shots, rebounds, passes, etc.
- I was able to find that there was a significant probability that player will get benched after shooting an airball.
- I used SportVU data files, which contain player and ball coordinates 25 times per second, as well as play-by-play data
Fall 2018
Real-time Ensemble Data for Understanding Suicide Epidemiology (SocWeB Lab Project)
- The goal of this project is to predict national suicide trends using social media data from sources such as Twitter and Tumblr as well as data provided by the CDC
- My role in this project was to write a Twitter and Tumblr scraper to get all posts that contained words from a list of keywords