•
Designed and developed scalable ETL pipelines using Spark and Scala for ingesting data from multiple sources (Teradata, S3) into distributed systems, ensuring efficient data retrieval and availability for stakeholders
•
Built and optimized data transformation services to produce high-quality, clean master data for downstream applications and data models, enabling data-driven decision-making across the organization
•
Collaborated with Data Science teams to deploy machine learning models at scale, integrating them seamlessly with data pipelines and user services, supporting product enhancements and data-driven decision-making
•
Utilized orchestration tools like Apache Airflow to schedule and monitor complex workflows, improving the efficiency and reliability of ETL pipelines, with a focus on reducing downtime and ensuring timely data availability
•
Hands-on experience in RDBMS, NoSQL, and analytics databases, including Redis and Amazon Redshift, enabling efficient data storage and retrieval for large-scale data processing workloads
•
Experienced with cloud-based data processing services on AWS (S3, Redshift) and GCP (BigQuery, GCS, Dataproc), optimizing data storage and processing solutions for large-scale distributed systems