JupiterBrains delivered a fully on-premise, zero-hallucination financial chatbot powered by a sophisticated multi-tier confidence architecture and advanced Arabic-English NLP capabilities. The solution combined real-time database querying, intelligent news retrieval, and transparent confidence scoring into a unified system that maintained absolute accuracy while providing comprehensive financial intelligence.
Tiered Confidence Architecture
The core innovation was a three-tier retrieval system that provided answers with explicit confidence labels. Tier 1 (High Confidence) handled queries about structured financial data by generating SQL queries on the fly and executing them directly against Argaam's proprietary databases. When users asked about revenue figures, earnings, balance sheets, or corporate disclosures, the system translated natural language questions into precise SQL statements, retrieved verified data, and returned answers with 100% accuracy and zero hallucination risk.
Tier 2 (High Confidence) addressed queries requiring context from news articles and market commentary. The system extracted relevant entities and intent from user questions, executed BM25 proximity searches across Argaam's Arabic and English news corpus, refined results using custom proprietary embeddings trained on financial content, and returned answers with full citations including article titles, publication dates, extracted text snippets, and direct links to original sources. Because all sources were internal, verified, and audited by Argaam's editorial team, these answers also carried High Confidence labels.
Tier 3 (Low Confidence) provided broader market context by searching controlled external sources for industry trends and global financial news. These answers were explicitly tagged as Low Confidence with complete citations and URLs, allowing users to verify information independently. This tier expanded the chatbot's informational coverage while preserving trust through transparent confidence signaling.
Advanced Bilingual NLP Stack
A specialized Arabic-English processing pipeline was developed to handle the linguistic complexity of Gulf financial markets. The system incorporated Arabic tokenization with morphological analysis to handle the language's complex root-pattern system, neural machine translation fine-tuned on financial terminology to preserve meaning and precision across languages, dual-language embeddings that captured semantic relationships in both Arabic and English, proximity search optimized for Arabic text patterns, and context-preserving translation that maintained financial terms, company names, and numerical data with perfect fidelity.
On-Premise Financial Small Language Models
To meet strict data governance requirements, the entire system operated on Argaam's on-premise infrastructure using CPU-optimized small language models. Custom embeddings were trained exclusively on Argaam's content without external data exposure. No SaaS services, cloud APIs, or external network connections were required. The isolated deployment environment ensured complete protection of proprietary financial data while delivering sub-second response times.
Real-Time Multi-Source Query Engine
When users submitted questions, ArgaamBot executed three parallel retrieval processes: SQL query generation and execution against structured databases (Tier 1), BM25 and embedding-based searches across Arabic and English news articles (Tier 2), and controlled searches of global financial sources (Tier 3). Results were merged, ranked by relevance and confidence, and presented in a unified interface with transparent confidence labels and complete source attribution.
Conversational Financial Intelligence
Beyond simple question-answering, ArgaamBot supported sophisticated analytical workflows including follow-up questions with conversational context, drill-down capabilities into detailed financial reports, extraction of graph-ready time-series data, automatic article summarization, company-level and sector-level comparative insights, and historical trend analysis—all while maintaining the zero-hallucination guarantee through strict source grounding.
Deployment: Fully on-premise infrastructure with isolated network environment
Agents Used: Europa (Financial Reasoning), Himalia (Arabic-English NLP), Sinope (Compliance & Audit)
Technologies: BM25 proximity search, custom financial embeddings, SQL auto-generation, Arabic morphological analysis, tiered confidence scoring
Timeline: 8 weeks from project initiation to production deployment