ANSHUL DHAWAN

[email protected] +91-9873919352 Gurugram, India
LinkedIn: https://linkedin.com/in/anshul-dhawan089244130/

PROFESSIONAL SUMMARY

Results-driven Data Engineer with over 4 years of experience designing and optimizing scalable ETL pipelines using Spark, Scala, and Python. Proven track record of processing billions of financial and clickstream records daily to enable data-driven decision-making and automate reporting. Skilled in cloud platforms including AWS (S3, Redshift) and GCP (BigQuery, Dataproc), with hands-on expertise in orchestration tools like Apache Airflow. Adept at collaborating with cross-functional teams to deploy machine learning models and deliver high-quality master data, driving product enhancements and operational efficiency.

WORK EXPERIENCE

Data Engineer 2
08/2023 - Present
Expedia (via TEKsystems) , Gurugram, India
Designed and developed scalable ETL pipelines using Spark and Scala for ingesting data from multiple sources (Teradata, S3) into distributed systems, ensuring efficient data retrieval and availability for stakeholders
Built and optimized data transformation services to produce high-quality, clean master data for downstream applications and data models, enabling data-driven decision-making across the organization
Collaborated with Data Science teams to deploy machine learning models at scale, integrating them seamlessly with data pipelines and user services, supporting product enhancements and data-driven decision-making
Utilized orchestration tools like Apache Airflow to schedule and monitor complex workflows, improving the efficiency and reliability of ETL pipelines, with a focus on reducing downtime and ensuring timely data availability
Hands-on experience in RDBMS, NoSQL, and analytics databases, including Redis and Amazon Redshift, enabling efficient data storage and retrieval for large-scale data processing workloads
Experienced with cloud-based data processing services on AWS (S3, Redshift) and GCP (BigQuery, GCS, Dataproc), optimizing data storage and processing solutions for large-scale distributed systems
Data Engineer
03/2022 - 07/2023
Paytm , Noida, India
Development of data pipelines using Spark, Scala and Python
Processing huge volumes of financial data (in billions) every day across various products to automate reporting that support business teams
Automation of Data Science Models to support DS team and provide the data to downstream models and pipelines on time
Created an automated recon process to identify the gaps among financial data sources and deliver data gaps on mail to stakeholders
Wrote a common utility to write data as spark dataframe to an internal tool named Datalake that helped to avoid dependency of external framework and write data directly
Working with AWS S3 as storage layer
Orchestrating the jobs using Airflow DAGS – Python Operator, Livy Operator, etc.
Data ingestion and processing of data using spark, Scala and python
Software Engineer
07/2019 - 03/2022
Airtel x-Labs , Gurugram, India
Development of a generic scalable spark framework to handle billions of records using pySpark
Data ingestion and processing of data using spark
Optimization of spark jobs to avoid resource overutilization
Build and unit test against specifications
Have Exposure with Airflow (Dags, operators,etc.)
Ensure quality by performing thorough testing and leveraging peer reviews for your work and the work of others
Developed aggregates on clickstream source from complex SQL queries as per business requirements
Created a pipeline to ingest continuous data into hive and exadata through Solace queue
Interaction with business users, analysts, data scientists and product customers for gathering requirements and functional specification
Data extraction/loading from/to Exadata, HDFS and hive as per business specifications
Have exposure developing ETL pipelines using AbInitio
Analysis and Prediction of Suicide Attempt
The main objective of project is to analyse the suicide dataset and identify significant attributes contributing towards suicide attempt and predict future such attempts with significant precision
Compared accuracies calculated using three models based on logistic regression, naïve bayes and random forest
Emoji Interpretor
A web app which interprets emojis and display their meaning. Built using ReactJS
App link : //App
Source : Github://Emoji-app
Cricket Fan Contest
A CLI quiz app for cricket fans built using JavaScript
App link : //App
Source code : Github://fan-contest

EDUCATION

B.Tech in Computer Science
01/2015 - 01/2019
Bharati Vidyapeeth’s College of Engineering, New Delhi , New Delhi, India GPA: 7.73
12th Non-Medical (PCM)
01/2015
Hansraj Model School, Punjabi Bagh, New Delhi , New Delhi, India GPA: 83.60%
10th
01/2013
Hansraj Model School, Punjabi Bagh, New Delhi , New Delhi, India GPA: 9.2/10.00

SKILLS

PROJECTS

Analysis and Prediction of Suicide Attempt
The main objective of project is to analyse the suicide dataset and identify significant attributes contributing towards suicide attempt and predict future such attempts with significant precision
Compared accuracies calculated using three models based on logistic regression, naïve bayes and random forest
Emoji Interpretor
Technologies: ReactJS
A web app which interprets emojis and display their meaning. Built using ReactJS
App link : //App
Source : Github://Emoji-app
Cricket Fan Contest
Technologies: JavaScript
A CLI quiz app for cricket fans built using JavaScript
App link : //App
Source code : Github://fan-contest

Similar Resumes

Sudhir Babu Senior Data Engineer & Database Managment-MOT Kuala Lumpur, Kuala Lumpur Ashishkumar Pandey DataOps Engineer Ilya Daronin Machine Learning Engineer with Data Science skills KHURRAM KHAN Database Design Engineer Karachi Javed Khan Cloud Data Engineer (Client: Telia Telecom) Sweden