Infrastructure Monitoring Engineer
Apply for this position
All fields marked * are required
Role Summary
The Infrastructure Monitoring Engineer will be responsible for 24x7 monitoring, alert management, incident triaging, and first-level root cause analysis across enterprise IT infrastructure and applications to ensure service availability, performance, and SLA compliance.
Key Responsibilities
· Perform real-time monitoring of infrastructure and applications using enterprise monitoring tools (APM, NPM, Infra monitoring).
· Proactively detect, analyze, and respond to alerts related to servers, databases, networks, APIs, and application health.
· Conduct initial triage and impact assessment of incidents and coordinate with L2/L3 infra, application, and vendor teams.
· Monitor CPU, memory, disk, network, JVM, database, and API metrics and identify abnormal trends.
· Validate alerts, reduce false positives, and support alert tuning and threshold optimization.
· Track incidents end-to-end, ensure timely escalation, and maintain SLA/OLA adherence.
· Support change, deployment, and maintenance activities from a monitoring readiness perspective.
· Prepare daily health reports, dashboards, and management summaries.
· Assist in RCA activities by providing logs, metrics, timelines, and monitoring insights.
· Support audit and compliance requirements by providing monitoring evidence and reports.
Technical Skills Required
· Monitoring Tools: Dynatrace, VuNet, AppDynamics, Nagios, Zabbix, or equivalent
· Infrastructure:
o Servers: Windows Server, Linux
o Network: Basic understanding of TCP/IP, latency, packet loss, VLANs
· Application Monitoring: JVM, API response times, error rates, service availability
· Database Monitoring (Basic): Oracle / MS SQL / MySQL (connectivity, performance metrics)
· ITSM Tools: ServiceNow or equivalent (Incident, Problem, Change)
· Log Analysis: Kibana / ELK / Splunk (basic)
· Cloud Exposure (Good to have): Azure / AWS monitoring concept
Required