•
Led a team of SREs to maintain and improve the reliability of critical systems, achieving 99.99% uptime
•
Developed and implemented incident response protocols, reducing mean time to recovery (MTTR)
•
Collaborated with software engineering teams to design and deploy scalable infrastructure solutions
•
Automated routine tasks using scripting languages (Python, Bash), saving 20+ hours of manual work per week
•
Conducted post-mortem analyses and implemented preventive measures, reducing the frequency of recurring incidents by 40%
•
Led observability implementation across infra, APM, logs, latency, exceptions, and alerts using BigPanda, Prometheus, Grafana, Azure and Splunk
•
Integrated AI-driven anomaly detection & forecast alerts, improving latency tracking, issue resolution, and performance optimization
•
Implemented Business Process & Compliance Monitoring, ensuring 100% security adherence and proactive anomaly detection
•
Enabled Splunk SIEM capabilities to support security investigations, threat detection, and compliance use cases within banking and financial environments
•
Upskilled teams through hiring, training, and knowledge-sharing sessions on observability & cloud automation
•
Created roadmap, identified delivery milestones, capacity, and cost planning to achieve the goals set
•
Designed and deployed Splunk Enterprise infrastructure with high availability by configuring indexer clusters, search head clusters, deployment servers, deployer, forwarders and license managers
•
Manage Splunk License, Indexers, Search Heads, configuration management and capacity. Monitoring using developed Splunk queries and dashboards as well as tailored Splunk apps
•
Onboarding of new data sources, parsing and extracting relevant data, and development of meaningful ways to display that data
•
Provide operational & technical support to users to identify and resolve issues, create and configure management reports and dashboards
•
Expertise in Splunk data ingestion, parsing, indexing, searching, reporting, alerting, and dashboard creation using SPL