Caroline

Tomasik

Caroline is a Data Scientist specializing in machine learning and feature importance, with valuable startup experience. She is a UCLA summa cum laude graduate with a master's degree in Cognitive Science, speciality in Computation. She is known for her exceptional communication skills, making her an ideal bridge between technical insights and stakeholders.

Master’s Thesis: Finding the Most Influential Features on Airbnb Rates

Created Machine Learning and Deep Learning models to predict the nightly rate of Airbnb listings in Los Angeles. Then I ran SHAP Feature Importance Analysis on the best model to understand which features are the most influential on price. Figure depicts that the number of guests a listing can accommodate and the longitude (i.e. closer to the coast) are the best predictors of nightly rate.

Utilizing Web-scraping, NLP, and ML to correctly classify Covid articles as originating from Fox News or CNN

Web-scraped Fox News and CNN, cleaned the articles using NLP techniques (removing stop words, lemmatizing, etc.), then used a Random Forest Classifier to predict the origin of the article with 92% accuracy. Figure above depicts the most frequently occurring bigrams (word pairs) per news site. CNN frequently mentioned the novelty of COVID, while Fox News uses such language less often.

Creating Proprietary Customer Identity Graph

Connected disparate consumer data to get a holistic customer profile. The gif demonstrates how we can see the multiple employees connected to their employer.

Discovering the Most Profitable Short-Term Rental Locations

Used Python to analyze the effect that location, saturation, and regulation have rental properties’ revenue. Presented findings in live Q&A.

Tracking the trends in Swedish Covid Cases

Presenting Thesis Findings on Podcast

Interactive Dashboard: Understanding the Capital of Top Hospitality Businesses

Cleaned data with Python and then visualized in Looker Studio to create this live, interactive dashboard.