Back to all jobs
Cosette Network

Observability Engineer

Jaipuronsite4.0 - Any years

Apply for this position

All fields marked * are required

Name & contact details are extracted from your resume automatically.

Job Description

Job Title: Observability Engineer

Total experience: 4+ Years

Location: Jaipur (5.5 days WFO)

Role Summary

The Observability Engineer will be responsible for designing, implementing, and managing end‑to‑end observability across applications, infrastructure, middleware, databases, networks, and cloud platforms. The role focuses on proactive monitoring, performance optimization, incident reduction, and enabling faster RCA through metrics, logs, traces, and dashboards.

This role closely works with Application Support, Command Center, Infra, Network, SRE, DevOps, and Vendors to ensure high availability of critical banking services.


Key Responsibilities

Observability & Monitoring

  • Design and maintain end‑to‑end observability for critical applications and platforms.

  • Implement and manage metrics, logs, traces, and events across:

    • Applications & APIs

    • Middleware (ESB, MQ, API Gateway)

    • Databases

    • Infrastructure & Network

    • Cloud (AWS / Azure / OCI, if applicable)

  • Build and maintain service‑based and journey‑based dashboards (Login, Transactions, Payments, etc.).

Tools & Platforms

  • Hands‑on experience with observability tools such as:

    • APM / Observability: Dynatrace, Datadog, New Relic, AppDynamics, Elastic

    • Visualization: Grafana, Kibana

    • Cloud-native: AWS CloudWatch / Azure Monitor

    • Enterprise tools (VuNet, ManageEngine, etc. – added advantage)

  • Configure thresholds, baselines, anomaly detection, and alerts.

Incident & RCA Support

  • Proactively identify performance degradation and potential outages.

  • Support P1/P2 incident analysis using distributed tracing and correlation.

  • Perform root cause analysis (RCA) and share improvement recommendations.

  • Reduce alert noise through alert rationalization and tuning.

Integration & Automation

  • Integrate observability tools with ITSM tools (ServiceNow / Remedy, etc.).

  • Enable auto‑ticketing, alert enrichment, and correlation.

  • Support automation for health checks, reports, and dashboards.

  • Work closely with:

    • Application & Infra teams

    • Network & Cloud teams

    • Command Center / NOC

    • Vendors & OEMs

  • Participate in architecture reviews and new application onboarding.


Required Skills & Experience

Technical Skills

  • Strong understanding of Observability concepts (Metrics, Logs, Traces).

  • Hands‑on experience with APM & Monitoring tools.

  • Good understanding of:

    • Distributed systems & microservices

    • APIs, ESB, message queues

    • Linux / Windows systems

    • Databases (Oracle / MySQL / PostgreSQL – basic monitoring view)

  • Knowledge of cloud monitoring and hybrid environments.

Operational Knowledge

  • ITIL‑based incident management (P1–P4).

  • Experience working in 24x7 production environments.

  • Understanding of SLAs, SLIs, SLOs, MTTR, availability metrics.


Good to Have (Preferred)

  • Banking / Financial Services domain exposure.

  • Experience with SRE practices.

  • Scripting knowledge (Python / Shell) for automation.

  • Experience with synthetic monitoring and user journey monitoring.


Soft Skills

  • Strong analytical and troubleshooting skills.

  • Clear communication for technical & non‑technical stakeholders.

  • Ability to work under pressure during incidents.

  • Documentation and reporting skills.


Skills

Required

observabilityapmmonitoringdistributed systemsmicroservicesapisesbmessage queueslinuxwindowsoraclemysqlpostgresqlcloud monitoring

Preferred

bankingfinancial servicessrepythonshellsynthetic monitoringuser journey monitoring