Conversational AI Agent Model Evaluation with AI
Apply for this position
All fields marked * are required
Job Title: Senior Conversational AI Agent Model Evaluation with AI
Location: Noida/Gurgoan/Pune/Bangalore
Experience: 6-14 Years
Role Overview
We are seeking a highly skilled Senior Conversational AI Agent Model Evaluation with AI to lead end-to-end testing and quality assurance of Voice AI / Conversational AI solutions across industries such as Healthcare, Banking, Insurance, and Utilities—primarily focused on US-based customer journeys.
This role combines domain understanding, AI-driven testing, simulation design, and insights generation to ensure enterprise-grade performance, compliance, and customer experience for voice AI deployments.
---
Key Responsibilities
1. Conversational AI & Domain Understanding
· Develop a deep understanding of Voice AI / Conversational AI architectures, including NLU/NLP pipelines, dialog management, and integrations.
· Analyze customer journeys across industries (Healthcare, BFSI, Insurance, Utilities, etc.) with strong awareness of US market nuances, compliance requirements, and user behavior patterns.
· Collaborate with product, design, and development teams to ensure test scenarios align with real-world customer journeys.
---
2. AI-driven Testing & Tool Utilization
· Lead end-to-end testing using AI-powered quality platforms such as:
o Cyara / Hammer / Bluejay / similar tools
· Design, execute, and manage:
o Functional testing
o Conversational flow validation
o Regression testing for NLP models
o Load and simulation testing for voice systems
· Ensure coverage across multi-turn conversations, edge cases, and failure scenarios.
---
3. Simulation, Evals & Custom Metrics Design
· Configure and run simulation frameworks to mimic real user interactions at scale.
· Define and implement:
o Evals (evaluation frameworks)
o Custom quality metrics (intent accuracy, containment, fallback rates, sentiment proxies, etc.)
· Align evaluation metrics with business KPIs and customer experience goals.
· Leverage the AI agent’s knowledge base and training data to design realistic test scenarios.
---
4. Transcript Analysis & Insight Generation
· Perform deep analysis of:
o Call transcripts
o Conversation logs
o AI-generated responses
· Identify:
o Intent misclassification
o Dialog breakdown points
o Knowledge gaps
o UX friction points
· Convert findings into structured recommendations for product, design, and engineering teams.
---
5. Dashboarding & Reporting
· Design and build insight-driven dashboards to:
o Highlight defects and performance gaps
o Quantify customer impact
o Track quality trends over time
· Present actionable insights to:
o Client stakeholders (business impact)
o Development teams (technical root cause)
· Enable data-driven prioritization of fixes and enhancements.
---
Preferred Experience & Qualifications
Experience
· 6–10 years of experience in:
o QA / Testing / Quality Engineering
o Conversational AI / Voice Bots / Contact Center Automation
· Hands-on experience with:
o AI testing platforms (Cyara, Hammer, Bluejay, or equivalent)
o Simulation frameworks and conversational testing tools
· Experience working with US clients or products serving US customers is highly preferred.
---
Educational Qualification
· B.Tech / BE (Computer Science, IT, Electronics, or related field)
---
Core Skills
Technical Skills
· Strong understanding of:
o NLP/NLU concepts (intent, entity recognition, confidence scores)
o Voice AI systems and telephony integrations
· Experience in:
o Test automation frameworks
o Data analysis (Excel, Python, or equivalent tools preferred)
o Dashboarding tools (Power BI, Tableau, Looker, etc.)
---
Functional & Analytical Skills
· Ability to connect technical defects with business/customer impact
· Strong analytical mindset with experience in transcript-driven insights
· Experience defining custom KPIs and evaluation metrics
---
Soft Skills
· Strong stakeholder communication (client + internal teams)
· Ability to translate insights into clear action plans
Proactive problem-solving approach with attention to detail
Required