Projects

Spring 2020

Life events prediction using social media data

  • The goal of this project is to understand people’s expression of life events through social media platforms.
  • I am developing machine learning models and drawing theory-driven insights from the data
  • I also worked on improving the various ML models that were developed by improving the feature set.

Oscar Winner Prediction Algorithm (CS 4641 Group Project)

  • The goal of the project was take movie-related data such as genre, box office earnings, length of movie, and month released to predict whether a movie would win an oscar
  • We trained the data on models such as Linear Regression, SVM, Naive Bayes, and Neural Networks to predict Oscar winners with 67% accuracy
  • We concluded that while our algorithm predicts with better than 50% accuracy, it is still not a very effective algorithm
  • My role in this group project was to develop the deep neural network using TensorFlow

Song Lyrics Generation using LSTM Network (Ongoing Personal Project)

  • The goal of this project is to train an LSTM network on a dataset of song lyrics in order to output its own set of lyrics

Glassdoor Project (SocWeB Lab Project)

  • The goal of this project is to predict companies’ performance using Glassdoor posts from employees
  • My role in this project was to preprocess the Glassdoor post dataset using NLTK and pandas and training various scikit-learn ML models such as SVM, Naive Bayes, Linear Regression

Fall 2019

Life Events Project (SocWeB Lab Project)

  • The goal of this project is to use social media data to predict the presence of a major life event in someone’s life, such as death of a loved one, marriage, or moving to a new house
  • My role in this project was to annotate the datasets if a life event was present based on the content of the social media post

Song Popularity Prediction Algorithm Using Song Lyrics (Personal Projects)

  • The goal of this project was to predict what place on the Billboard Hot 100 chart a song would be based on the lyrics of the song
  • I utilized the Genius API along with the Billboard API to collect a dataset from 1980-present of all the songs that were on the Hot 100 chart each month
  • I trained a neural network that correctly predicted the placement of the song with only 40% error and a Naive Bayes model that predicted with 45% error

Summer 2019

Analyzing the Impact of Airballs in Basketball using SportVU (UPenn Project)

  • The goal of this project was to see if there was a correlation between an occurrence of an airball to other game events such as shots, rebounds, passes, etc.
  • I was able to find that there was a significant probability that player will get benched after shooting an airball.
  • I used SportVU data files, which contain player and ball coordinates 25 times per second, as well as play-by-play data

Fall 2018

Real-time Ensemble Data for Understanding Suicide Epidemiology (SocWeB Lab Project)

  • The goal of this project is to predict national suicide trends using social media data from sources such as Twitter and Tumblr as well as data provided by the CDC
  • My role in this project was to write a Twitter and Tumblr scraper to get all posts that contained words from a list of keywords