About

About Me

A results-driven Data Engineer & Analyst with 4+ years of experience in building efficient data pipelines, ETL processes, and data visualizations. Currently pursuing a Master’s in Data Science at the University of New Haven, I have a strong background in Python, PySpark, SQL, Power BI, and Tableau. Passionate about data-driven decision-making, I am actively seeking full-time opportunities in Data Engineering, Data Analytics, or related fields where I can leverage my skills to optimize data workflows, enhance business intelligence, and contribute to innovative projects.

  • Profile: Data Science, Data Analyst & Data Engineer
  • Education: Pursuing Masters in Data Science
  • Language: English, Hindi & Gujarati
  • BI Tools: Microsoft Power BI & Tableau
  • Other Skills: Cloud, PySpark, Excel, GitHub, & Snowflake
  • Certifications: AZ-104-Azure Administrator, PL-300-Microsoft Power BI Data Analyst Associate, 1Z0-071: Oracle Database SQL Associate.
  • Interest: Traveling, Netflix, Cricket & Football

0 +   Projects completed

Skills

  • Python
  • SQL
  • Data Visualization and Cleaning
  • AWS (Amazon Web Services)
  • UNIX
  • Machine Learning

Projects

Projects

Below are some of the projects I have developed over the years

Laptop Price Prediction

Created a machine learning model to accurately predict laptop prices, utilizing data cleaning, visualization, and feature selection techniques to enhance model performance.

AI powered Dots and Boxes Game

Implemented an AI opponent for the Dots and Boxes game using the Monte Carlo Tree Search (MCTS) algorithm. Enhanced gameplay by optimizing AI performance through dynamic strategy adaptation and win-loss ratio evaluation.

Twitter Data Analysis using Airflow

Developed a robust ETL pipeline on AWS EC2 using Apache Airflow for Twitter data, improving data integrity and processing efficiency. Utilized Amazon QuickSight to create dynamic visualizations, enabling advanced analytics and informed decision-making.


Biomedical Named Entity Recognition (NER)

Conducted a comparative analysis of transformer-based models (BioBERT, BERT) combined with BiLSTM and CRF for Biomedical NER on the BC5CDR dataset, achieving 90% F1-score using BioBERT with CRF.

Campus Image Semantic Segmentation

Implemented a semantic segmentation model using DeepLabV3 with ResNet-50 backbone to classify campus images into doors, stairs, and background, achieving 63% mIoU and 83% pixel-wise accuracy.

Superstore Data Analysis Dashboard

Built an interactive Power BI dashboard to analyze Superstore sales, returns, and trends, providing actionable insights for business decisions and enabling identification of growth opportunities.


More projects on Github

I enjoy tackling business challenges and revealing the untold stories hidden within data


GitHub

Resume

Resume

Experience

August 2019 - March 2023

Data Engineer

HCL Technologies

HCLTech, A global leader in IT services and consulting.Renowned for its expertise in digital, engineering, and cloud services.

  • Led the development of scalable ETL pipelines using Apache Spark, increasing data processing efficiency by 20% and reducing latency in downstream analytics.
  • Optimized Spark SQL queries, reducing execution time from 2 hours to just 10 minutes, saving over 100 compute hours per month and cutting infrastructure costs by 30%.
  • Automated data ingestion and transformation tasks using Python and AWS Lambda, reducing manual intervention by 80% and improving workflow efficiency.
  • Designed and automated data pipelines to feed interactive reports and dashboards using Power BI and Tableau, ensuring real-time data availability and seamless integration with analytics workflows.
  • Processed and maintained datasets exceeding 10TB, ensuring high data integrity and availability for analytics and reporting teams.
  • Authored and maintained technical documentation for 100% of data sources and ETL pipelines, improving team collaboration and onboarding efficiency.

November 2018 - August 2019

Junior Data Analyst

Savantis Solution Pvt Ltd

Savantis is a full-service SAP global systems integrator focused on one common goal—success!

  • Increased reporting efficiency by 40% by automating data extraction and visualization using Power BI and Tableau, enabling faster decision-making for stakeholders.
  • Improved data accuracy by 35% by optimizing SQL queries and implementing data validation checks, ensuring reliable business insights.
  • Simplified complex data insights for non-technical stakeholders, improving communication and facilitating strategic decision making.
  • Conducted root cause analysis for 50+ critical incidents, implementing preventative solutions that reduced recurrence by 70%.



Education


2023-2025

Master In Data Science

University Of New Haven

Grade: 3.94 Till Date

2013-2017

Bachelor of Engineering

Rajiv Gandhi College Of Engineering Research and Technology

Grade: First Division with Distinction

Certifications

Certifications

Below are some of the certifications I have obtained.

Certification 1

AZ-104: Azure Administrator

Microsoft Azure Administrator Certification showcasing expertise in cloud solutions and infrastructure.

Certification 2

PL-300: Power BI Data Analyst

Microsoft Power BI certification for data visualization, analytics, and business intelligence.

Certification 3

1Z0-071: Oracle Database SQL Associate

Oracle Database SQL certification validating skills in SQL fundamentals and database management.

Contact

Contact Me

Below are the details to reach out to me!

Contact Number

+1(973)380-6309

Email Address

yashbraythatha@gmail.com

Download Resume

Resume