NOTABLE WORK
Capitalizing on my fresh education, reputed certifications, and - above all - experience in data science, I continually hone my skills by taking on meaningful, full-scale data science projects.
If you would like to fully explore all of my projects, I would highly encourage you to check out my GitHub portfolio.
business optimization with data-driven dynamic pricinG model
-
Extensive data analysis in Python and visualizations in Python and Tableau, also assisted by Generative AI tools.​
-
Sophisticated analytics (e.g., Calculated Fields, Quick Table Calculations) in Tableau alongside interactive dashboard.
-
Small-scale Data Engineering in the form of big data tools and Cloud Data Platform - Databricks - used for building and automating ETL pipeline.
-
Data wrangling in SQL (Databricks) using Window functions, CTEs, Subqueries, complex JOINs, etc.
-
K-Means Clustering with 3-D visualization and Cluster Analysis with Violin Plots for optimal customer segmentation and marketing.
-
Time Series Analysis & Forecasting model for Dynamic Pricing, and statistical testing using ANOVA.
DRIVING OMNICHANNEL CONVERSION THROUGH CLICKSTREAM INTELLIGENCE
Applied K-Means clustering to clickstream data from various sources, testing multiple cluster configurations and evaluating model stability to identify the optimal segmentation (k = 3). Interpreted behavioral clusters to distinguish high-intent, browsing, and low-engagement user cohorts, enabling targeted merchandising and conversion strategies.
Conducted ANOVA analysis on engagement and purchase metrics across customer segments.
Implemented statistical hypothesis testing (Chi-Squared Test of Independence) to evaluate whether product page order influenced viewer engagement and interest; results informed recommendations for optimized product placement and ranking logic.
data engineering: building a data warehouse from scratch
Architected an end-to-end Data Warehouse using Medallion Architecture (Bronze–Silver–Gold), transforming fragmented source data into a scalable, analytics-ready “single source of truth" mainly in Databricks using PySpark.
Built robust ETL pipelines and data models (fact/dimension tables, star schema) with SQL, incorporating data quality checks, validation, and metadata tracking to ensure reliability, traceability, and performance.
​
Designed business-aligned data architecture enabling efficient querying, BI reporting, and downstream analytics, bridging data engineering with decision-making use cases.
fOOTBALLYZING: advanced analytics & predictive modeling
Advanced analytics and visualizations in the form of pivot tables, line plots with mean lines, pairplots for correlation analysis of all numerical variables, etc.​
​
Regularization applied to regression model, using L1 (Lasso) and L2 (Ridge) to address, respectively, feature selection and multicollinearity.​
Machine learning models built to predict future La Liga winners alongside assigning probabilities.​
Predictive models evaluated using crucial evaluative metrics such as Mean Squared Error, R-Squared score, Precision, F1-Score, and Confusion Matrix.
DECODING THE BEAUTIFUL GAME: A DEEP DIVE INTO THE HISTORICAL TRENDS OF EUROPE'S FOOTBALL LEAGUES
Web-scraped and aggregated league results data covering all seasons of Europe's top six football (soccer) leagues.
Extensive EDA and a variety of insightful visuals created in Python, including - and not limited to - regression plots, bar charts, histograms with KDEs.
Applied Feature Engineering for usage in machine learning and predictive modeling.​​
OLS Regression, K-Means Clustering and Random Forests Classifier algorithms utilized for historical team tier classifications.
INNOVATING THE SOCIAL SCIENCES WITH CUTTING-EDGE DATA SCIENCE
Organized tables in Excel, conducted data verification, and applied advanced formulas (e.g., XLOOKUP, ISNUMBERMATCH).​
Used PostgreSQL for deeper data wrangling and pattern detection.​
Python programming was utilized for exploratory data analysis (EDA) and creating insightful visuals such as heatmaps.​
R (statistical programming language) was used to build a final multiple regression model and evaluate key descriptive statistics.
Google Capstone Project: Cyclistic Ride Share Analysis
Organized tables in Excel to manipulate data and find pertinent information by filtering, creating Pivot Tables, etc.
Utilized Excel, SQL, and Tableau to analyze more than 1 million rows of data regarding bike usage trends of different member types and detect consumer patterns to target.
Created engaging visualizations and advised the marketing team on future strategies to increase customer engagement.
covid-19: a comprehensive analysis
Guided Personal Project
Calculated COVID case spread and death rates across the world to identify the most-afflicted and best-coping nations by utilizing Excel and SQL.
Created engaging visualizations on Tableau to portray vital statistics regarding COVID across the world.
Presented key takeaways and plans-of-action to better counter the pandemic.







