MortgageQ AI

Reducing AI Response Latency by 91%

MortgageQ AI transformed its AI-powered mortgage intelligence platform by migrating from Vercel AI SDK + OpenAI to LangGraph and Groq Cloud, achieving a 91%+ reduction in response time (from 120 seconds to under 10 seconds) while reducing cost per query by 40-60% through deterministic parallel execution and zero-dependency architecture. 

robot standing near luggage bags

Business Intelligence & Market Research SaaS 

  • Market Size: Global BI software market: $47B (2024) → $168B (2035). AI software market: $174B (2024) → $1.3T (2030). 


  • Technology Stack: React, Tailwind UI, Python, Node.js, NLP-based AI, News APIs (80+ sources), Commodities API, FRED, LinkedIn Signals, SEC Filings 


  • Client Profile: Leadership teams in facilities management, construction, manufacturing, and mining sectors with high exposure to raw material volatility and regulatory changes. 

Key Highlights:
  • 91%+ latency reduction (120s → <10s average response time) 

  • 40-60% cost reduction per query through infrastructure optimization 

  • Sub-10 second AI responses enabling real-time decision support 

  • Parallel execution architecture replacing sequential tool calls 

  • Deterministic workflows eliminating hallucination retries and tool loops 


Challenge Faced:

Performance Crisis: 

The existing system delivered an average response time of 120 seconds per complex query, an unacceptable delay for mortgage advisors working with time-sensitive borrower inquiries. Users frequently abandoned sessions before receiving answers, creating poor engagement metrics and limiting platform adoption.

Technical Root Causes: 

  • Sequential execution: Lender searches, rate lookups, and eligibility checks ran one after another instead of in parallel.

  • SDK overhead: Extra abstraction layers added latency without real benefit.

  • Tool loops: The model often retried tool selection, burning tokens on meta-decisions instead of retrieval.

  • Hallucination retries: Bad or incomplete answers triggered repeated tool calls and inferences, further slowing responses.

  • API delays: Database access was routed through unnecessary API layers instead of direct queries.

  • High costs: Long runtimes, retries, and inefficient tooling drove up per-query costs.

Business Impact: 

120-second delays eroded credibility, left advisors hanging mid-conversation, The high cost per query threatened unit economics, making it difficult to scale the platform sustainably. 

High Latency

Sequential Execution

SDK Overhead

Our Solution:

JupiterBrains executed a two-phase AI re-architecture strategy, optimizing both orchestration and inference layers while eliminating legacy dependencies. 


Phase 1: New Marketplace Agent 

Objective: Achieve immediate performance gains while maintaining production stability. 

Implementation: Replaced Vercel AI SDK with LangGraph for agent orchestration control. Switched from OpenAI to Groq Cloud (241 tokens/sec LPU) and added parallel lender searches.

Results: Response time dropped from 120s to 30s, a 75% improvement, validating the approach and boosting UX during Phase 2.


Phase 2: Agent Alpha

Objective: Sub-10 second responses via full architectural optimization.

Key Changes:

  • Zero-dependency build with rule-based routing—no LLM tool loops.

  • Parallel atomic execution: lender searches, rates, and checks run simultaneously with direct Supabase access.

  • Groq LPU migration: Optimized prompts leverage hardware acceleration for max throughput, fewer tokens.

  • Hybrid vector + full-text search for relevance and coverage.

QA & Deployment: Tested outputs against prior system; P50/P95/P99 dashboards. Deployed with Groq endpoints, LangGraph orchestration, Supabase direct connect.

LangGraph Orchestration

Groq LPU Inference

Parallel Execution

Before:
  • Average response time of ~120 seconds per complex query 

  • Sequential lender search requiring multiple round-trips 

  • Frequent agent retry loops and tool selection failures 

  • High per-query inference costs due to inefficient workflows 

  • Poor user engagement metrics with high abandonment rates 

  • Limited platform usability during live borrower conversations 

  • Scalability concerns due to cost structure 


After:
  • Average response time under 10 seconds for all query types

  • Atomic parallel market scans across all lenders simultaneously 

  • Deterministic workflows eliminating retry loops and hallucinations 

  • 40-60% lower cost per query through infrastructure optimization 

  • Real-time decision support capability during advisor-borrower interactions 

  • Significantly improved user engagement and session completion rates 

  • Sustainable unit economics enabling platform scaling 

Impact Value Metrics

<10s

Average Response Time

91%+

Latency Reduction

40-60%

Cost per Query Reduction

Testimonials

"What used to take nearly two minutes now feels instant. Agent Alpha completely changed how we interact with mortgage data. I can now give borrowers real-time answers during our conversations instead of promising to call them back. This is what I expected AI to be when I first heard about it." 

Mortgage Advisor, Early Adopter

Found This Insightful?

If you'd like to discuss this topic further, drop your details and we'll connect with you.

Keep Exploring

Here's another post you might find useful