•
Managed and continuously monitored cloud‑based systems and production environments to ensure high availability, stability, and optimal performance, leveraging Grafana and Zabbix alerts for real‑time visibility.
•
Led end‑to‑end incident management for P1 and P2 incidents, including rapid identification, troubleshooting, root cause analysis, and resolution to minimize business impact and downtime.
•
Utilized Jira to track incidents, changes, and follow‑up actions, ensuring clear ownership, accurate prioritization, and timely resolution across support and engineering teams.
•
Acted as a key contributor to change management, coordinating releases, patches, and new deployments with customers to reduce risk and prevent service disruption in production environments.
•
Served as the primary point of contact for customer communication during incidents and service activities, providing timely updates, status reports, and post‑incident follow‑ups to maintain transparency and trust.
•
Created and maintained detailed technical documentation covering client environments, release procedures, incident runbooks, and support workflows to improve knowledge sharing and operational efficiency.
•
Collaborated closely with internal teams and customers to proactively identify potential risks, improve monitoring coverage, and enhance system reliability.