Observability Engineer
Apply for this position
All fields marked * are required
Job Title: Observability Engineer
Total experience: 4+ Years
Location: Jaipur (5.5 days WFO)
Role Summary
The Observability Engineer will be responsible for designing, implementing, and managing end‑to‑end observability across applications, infrastructure, middleware, databases, networks, and cloud platforms. The role focuses on proactive monitoring, performance optimization, incident reduction, and enabling faster RCA through metrics, logs, traces, and dashboards.
This role closely works with Application Support, Command Center, Infra, Network, SRE, DevOps, and Vendors to ensure high availability of critical banking services.
Key Responsibilities
Observability & Monitoring
Design and maintain end‑to‑end observability for critical applications and platforms.
Implement and manage metrics, logs, traces, and events across:
Applications & APIs
Middleware (ESB, MQ, API Gateway)
Databases
Infrastructure & Network
Cloud (AWS / Azure / OCI, if applicable)
Build and maintain service‑based and journey‑based dashboards (Login, Transactions, Payments, etc.).
Tools & Platforms
Hands‑on experience with observability tools such as:
APM / Observability: Dynatrace, Datadog, New Relic, AppDynamics, Elastic
Visualization: Grafana, Kibana
Cloud-native: AWS CloudWatch / Azure Monitor
Enterprise tools (VuNet, ManageEngine, etc. – added advantage)
Configure thresholds, baselines, anomaly detection, and alerts.
Incident & RCA Support
Proactively identify performance degradation and potential outages.
Support P1/P2 incident analysis using distributed tracing and correlation.
Perform root cause analysis (RCA) and share improvement recommendations.
Reduce alert noise through alert rationalization and tuning.
Integration & Automation
Integrate observability tools with ITSM tools (ServiceNow / Remedy, etc.).
Enable auto‑ticketing, alert enrichment, and correlation.
Support automation for health checks, reports, and dashboards.
Work closely with:
Application & Infra teams
Network & Cloud teams
Command Center / NOC
Vendors & OEMs
Participate in architecture reviews and new application onboarding.
Required Skills & Experience
Technical Skills
Strong understanding of Observability concepts (Metrics, Logs, Traces).
Hands‑on experience with APM & Monitoring tools.
Good understanding of:
Distributed systems & microservices
APIs, ESB, message queues
Linux / Windows systems
Databases (Oracle / MySQL / PostgreSQL – basic monitoring view)
Knowledge of cloud monitoring and hybrid environments.
Operational Knowledge
ITIL‑based incident management (P1–P4).
Experience working in 24x7 production environments.
Understanding of SLAs, SLIs, SLOs, MTTR, availability metrics.
Good to Have (Preferred)
Banking / Financial Services domain exposure.
Experience with SRE practices.
Scripting knowledge (Python / Shell) for automation.
Experience with synthetic monitoring and user journey monitoring.
Soft Skills
Strong analytical and troubleshooting skills.
Clear communication for technical & non‑technical stakeholders.
Ability to work under pressure during incidents.
Documentation and reporting skills.
Required
Preferred