BBMP e-Khata

98.4% Accuracy in Kannada-English Name Matching for Property Records

The Bruhat Bengaluru Mahanagara Palike (BBMP), Bengaluru's municipal corporation managing 2.2 million properties, transformed property record reconciliation by implementing an AI-powered Kannada-English name matching engine that achieved 98.4% accuracy, identified 300,000 previously unmatched properties, and reduced manual verification workload by 80% through domain-specific small language models and the Antarakshar/Apsara-Antara transliteration datasets. 


Open book against a lemon patterned wallpaper.

Industry: Government & Urban Administration / E-Governance 

Market Size: India's e-governance market reached $1.5 billion in 2024 and is projected to grow to $5.5 billion by 2035 at 12.5% CAGR, driven by digital identity initiatives, data-driven policymaking, and mobile governance solutions[1]. The broader global e-government market was valued at $50.1 billion in 2024 and is expected to reach $90.9 billion by 2033 at 7.5% CAGR[2]. Property tax digitization has emerged as a critical component of urban revenue management, with India's Digital India Land Records Modernization Programme (DILRMP) achieving 95% computerization of land records covering 626,000 villages and 68% digitization of cadastral maps[3]. Urban local bodies across India are increasingly adopting AI-driven solutions for property tax assessment, collection, and citizen service delivery to address revenue leakages and enhance transparency in municipal governance. 

Technology Stack: 

Domain-specific Small Language Models (SLMs) 

Kannada-English transliteration engines 

Phonetic encoding and similarity algorithms 

Antarakshar/Apsara-Antara datasets 

Rule-based + ML hybrid scoring systems 

GAN-driven accuracy refinement

On-premise CPU-optimized infrastructure 

Role-Based Access Control (RBAC) 

Comprehensive audit logging systems 

Real-time API integration with e-Khata platform 

Region: Bengaluru, Karnataka, India 

Client Profile: BBMP (Bruhat Bengaluru Mahanagara Palike) is one of India's largest municipal corporations, governing 800 square kilometers and serving over 12 million residents. The Revenue Department manages property tax assessment, collection, and khata (property ownership record) issuance for 2.2 million registered properties. The e-Khata initiative aimed to digitize property records by consolidating data from multiple government departments including BBMP's internal property tax database, BESCOM (electricity utility) customer records, and Kaveri (land registry) property ownership data—all containing citizen names in both Kannada and English with significant spelling variations and transliteration inconsistencies.

Key Highlights
  • 98.4% name matching accuracy across Kannada-English transliteration variations 

  • 300,000 previously unmatched properties identified through cross-departmental reconciliation 

  • 80% reduction in manual field verification and document processing workload 

  • 3 lakh (300,000) records processed daily with sub-second API response times 

  • 6 weeks deployment timeline from project initiation to production integration 

  • Zero external data transmission through fully on-premise deployment 

  • 2.2 million properties in BBMP jurisdiction now streamlined for e-Khata issuance 

  • Sub-second latency for real-time name matching queries via API 


Challenges

The BBMP e-Khata digital transformation initiative required consolidating property ownership records from three separate government databases—BBMP's property tax system, BESCOM's electricity connection records, and Kaveri's land registry data. The critical bottleneck was that the same citizen's name appeared in dozens of inconsistent variations across these systems, preventing automated record matching and creating massive manual verification workloads that delayed khata issuance by 4-6 weeks. 


Core Challenges 

The name matching problem presented five interrelated technical and operational challenges that conventional string matching algorithms could not solve: 


Kannada-English Transliteration Inconsistencies: The same individual's name was recorded differently across databases due to phonetic transliteration variations. For example, a citizen named "ಶ್ರೀನಿವಾಸ" (Shrinivasa in Kannada) might appear as "Srinivas," "Sreenivas," "Shrinivas," "Sreenivasa," or "Srinivaas" in English records. The complexity extended to consonant clusters (ಕ್ಷ = "ksha" or "ksh"), retroflex sounds (ಡ vs ದ), vowel length variations (ಈ vs ಇ), and silent letters common in Kannada morphology. Traditional exact-match or simple fuzzy-match systems failed catastrophically, achieving only 55-70% accuracy. 


Spelling Variations and Phonetic Drift: Beyond transliteration, records contained spelling errors, regional pronunciation differences, and phonetic approximations. Names with similar sounds but different meanings were conflated, while genuine matches were missed due to single-character differences. Historical data entry by different operators created accumulated spelling inconsistencies across decades of records. OCR-induced errors from scanned legacy documents introduced additional noise, with characters like "ಬ" (ba) confused with "ವ" (va) or "ಮ" (ma). 


Partial Names, Initials, and Missing Surnames: Cultural naming conventions in Karnataka include patronymics, village names, caste surnames, and abbreviations. One database might store "S. Ramesh Kumar" while another recorded "Ramesh Kumar S" or "Ramesh Kumar Shrinivas." Some records used only initials ("S. R. Kumar"), while others spelled out full ancestral names. Matching algorithms needed to identify core name components while handling arbitrary ordering, abbreviations, and missing elements. 


Data Quality Issues Across Legacy Systems: Historical records contained incomplete entries, null values, extraneous punctuation, inconsistent spacing, mixed case formatting, and embedded special characters. Some entries included suffixes like "S/o" (son of), "W/o" (wife of), titles like "Sri," "Smt," honorifics, and nicknames in parentheses. Cleaning and normalizing this data required domain-specific preprocessing that understood Karnataka naming conventions and cultural context. 


Scale and Performance Requirements: With 2.2 million properties in BBMP jurisdiction and the need to reconcile against BESCOM and Kaveri databases containing millions of additional records, the system needed to process batch matching jobs overnight while simultaneously serving real-time API queries during business hours. Manual verification by 6-8 Assistant Revenue Officers (AROs) per ward was unsustainable. The solution required sub-second response times for citizen-facing khata applications while maintaining high accuracy to prevent false positives that would incorrectly merge unrelated individuals' property records. 


Data Sovereignty and Privacy Constraints: Citizen names, Aadhaar linkages, property ownership records, and tax payment histories constituted highly sensitive personally identifiable information (PII) subject to India's emerging data protection regulations. Cloud-based AI solutions presented unacceptable data exposure risks. The government mandated fully on-premise deployment within BBMP's data center with role-based access controls, comprehensive audit logging, and zero external data transmission. This eliminated the option of using cloud-based large language models or external APIs. 

Transliteration Variations

Spelling Inconsistencies

Partial Name Matching

Our Solution

JupiterBrains delivered a domain-specific Kannada-English name matching engine powered by small language models fine-tuned on Indian name datasets, phonetic encoding algorithms designed for Dravidian languages, and hybrid rule-based plus machine learning scoring. The solution operated entirely on-premise within BBMP's infrastructure, processing millions of records with 98.4% accuracy while maintaining sub-second API response times for real-time citizen queries. 


Custom Transliteration and Phonetic Encoding 

The foundation was a sophisticated transliteration engine trained on the Antarakshar and Apsara-Antara datasets—comprehensive collections of Kannada-English name pairs covering regional variations, phonetic patterns, and transliteration conventions specific to Karnataka. The system normalized Kannada names into standardized phonetic representations, applying letter-stability rules that weighted consonants and initial characters more heavily than vowels and trailing characters, since these remain more consistent across spelling variations. 

The phonetic encoder generated stable signatures for names that captured sound patterns rather than exact spelling. For example, "Shrinivasa," "Srinivas," and "Sreenivas" all mapped to similar phonetic encodings that scored as high-confidence matches despite surface-level string differences. The encoding handled Kannada-specific phonetic features including aspirated consonants, retroflex sounds, vowel nasalization, and consonant clusters common in Sanskrit-derived names. 

For ambiguous cases not present in training data, the system employed a controlled fallback to Google Translate API, using external translation only when internal datasets lacked coverage. This hybrid approach maximized accuracy on common Karnataka names while maintaining reasonable performance on rare or newly encountered name variants.


 Hybrid Rule-Based and Machine Learning Scoring 

The matching engine combined deterministic linguistic rules with learned similarity patterns. Rule-based components encoded domain knowledge about Karnataka naming conventions: recognizing that "S. Ramesh" and "Ramesh S" likely refer to the same person, handling patronymics and village names that appear in varying positions, and identifying common abbreviation patterns (K. N. for "Karnataka Nagara," D/o for "daughter of"). 

Machine learning models trained on historical BBMP, BESCOM, and Kaveri match pairs learned subtle similarity patterns that rules couldn't capture. The ML component scored phonetic similarity, character edit distance with position-weighted penalties, n-gram overlap, and contextual signals from surrounding address or property data when available. The hybrid scoring system merged rule-based and ML confidence scores, producing a final match confidence percentage for each candidate pair. 

High-confidence matches (above 95% threshold) automatically proceeded to record linkage. Medium-confidence matches (80-95%) were flagged for human review with specific discrepancy highlights. Low-confidence matches (below 80%) were excluded from automatic processing, preventing false positives that could incorrectly merge unrelated individuals' property records. 


GAN-Driven Accuracy Refinement for Edge Cases 

Generative Adversarial Network (GAN) techniques refined the matching engine's performance on difficult cases including nicknames and shortened forms, initials versus full names, missing or transposed middle names, and names with extensive vowel variations. The GAN framework generated synthetic name variation pairs, training the discriminator to distinguish genuine matches from false positives with increasing precision. This adversarial training improved the model's ability to handle edge cases that appeared infrequently in training data but occurred regularly in production. 


Batch Processing and Real-Time API Integration 

The system supported two operational modes addressing different use cases. Overnight batch processing reconciled the complete BBMP, BESCOM, and Kaveri databases, identifying 300,000 previously unmatched property records where ownership could now be definitively established. These batch jobs processed up to 300,000 records daily, running on BBMP's CPU cluster without GPU infrastructure requirements. 

Simultaneously, a real-time API served citizen-facing e-Khata applications and ARO office workflows. When citizens submitted khata requests or officers queried property ownership, the API returned match results with confidence scores in milliseconds. The API integration enabled Assistant Revenue Officers to focus on exception handling rather than routine verification, dramatically improving throughput. 


On-Premise Deployment with Data Governance 

Recognizing the absolute requirement for data sovereignty, the entire system operated within BBMP's secure VPC (Virtual Private Cloud) with network isolation. All processing occurred on CPU-optimized servers, eliminating expensive GPU dependencies and reducing operational costs by 90% compared to GPU-based alternatives. Zero external data transmission occurred during production operations (except controlled Google Translate API calls for rare edge cases, which transmitted only name strings without associated PII context). 

Role-Based Access Control (RBAC) ensured that only authorized revenue department personnel accessed citizen name data, with all queries logged for audit purposes. Comprehensive audit trails recorded every match decision, confidence score, and human review action, providing complete traceability for regulatory compliance and dispute resolution. The governance framework aligned with India's emerging data protection regulations and Karnataka state government data handling policies. 

Deployment: Fully on-premise within BBMP data center, network-isolated VPC 
Agents Used: Himalia (Document & Data Intelligence for name parsing and matching), Sinope (Compliance & Audit Logging) 
Technologies: Antarakshar/Apsara-Antara datasets, phonetic encoding algorithms, hybrid rule-based + ML scoring, GAN-driven refinement, CPU-optimized inference 
Timeline: 6 weeks from project initiation to production deployment with e-Khata integration 

Real-Time API

Hybrid Scoring System

Phonetic Encoding

Before
  • Name matching accuracy across BBMP, BESCOM, and Kaveri databases: 55-70% 

  • Manual verification required 6-8 Assistant Revenue Officers per ward for routine cases 

  • Property ownership transfers and khata issuance delayed by 4-6 weeks 

  • Frequent duplicate khata entries due to spelling variation confusion 

  • Processing limited to business hours with manual lookups taking minutes per case 

  • 300,000 properties remained unmatched across departmental databases 

  • Limited audit trail for match decisions creating dispute resolution challenges 

  • No systematic approach to handling Kannada-English transliteration variations 

Results After Usage
  • 98.4% name matching accuracy achieved across multilingual departmental databases 

  • 80% reduction in manual field verification and document processing workload 

  • Sub-second API response times enabling real-time khata application processing 

  • 300,000 previously unmatched properties successfully identified and reconciled 

  • Near-zero false positive rate preventing incorrect record mergers 

  • 24×7 automated batch processing capability handling 300,000 records daily 

  • Complete audit trails with confidence scoring for every match decision 

  • ARO officers now focus exclusively on exception cases requiring human judgment 

Impact Value Metrics

98.4%

Name Matching Accuracy

80%

Manual Verification Reduced

300,000

New Propert

Business Impact

The name matching engine delivered transformative improvements in BBMP's property tax administration and citizen service delivery. The 98.4% accuracy across Kannada-English transliteration variations solved a problem that had plagued Bengaluru's property records for decades, enabling the e-Khata initiative to achieve its core objective of unified digital property ownership records. 

The identification of 300,000 previously unmatched properties represented a major revenue discovery opportunity. These properties, which existed in electricity connection or land registry databases but had no corresponding BBMP tax records, represented significant untapped property tax revenue. The automated reconciliation enabled BBMP to bring these properties into the formal tax system, strengthening municipal finances and ensuring equitable tax burden distribution. 

Operationally, the 80% reduction in manual verification workload freed Assistant Revenue Officers from repetitive data entry and lookup tasks, allowing them to focus on complex cases requiring domain expertise, fraud investigation, dispute resolution, and citizen relationship management. The productivity improvement enabled BBMP to handle growing khata application volumes without proportional staffing increases, improving fiscal efficiency. 

Citizen experience improved dramatically with sub-second response times replacing the previous 4-6 week waiting periods for property ownership verification. The real-time API integration meant that citizens initiating property transactions, applying for building permits, or requesting khata certificates received instant preliminary verification, with only exceptional cases requiring human review. This speed and transparency strengthened public trust in BBMP's digital services. 

The on-premise deployment model addressed critical data sovereignty concerns, demonstrating that sophisticated AI capabilities could operate within strict government infrastructure constraints without compromising citizen privacy. This architectural approach established a replicable template for other Karnataka Urban Local Bodies (ULBs) and Indian municipal corporations facing similar property record challenges. 

Strategically, the success positioned BBMP as a leader in urban governance innovation, showcasing how domain-specific AI solutions tailored to Indian linguistic and cultural contexts could outperform generic cloud-based alternatives. The project validated the feasibility of CPU-optimized small language models for government-scale deployments, achieving 90% cost savings compared to GPU infrastructure while maintaining high accuracy. 

The comprehensive audit trail and confidence scoring system strengthened BBMP's position in property ownership disputes, providing transparent, data-driven evidence for administrative and legal proceedings. The ability to trace every match decision with documented confidence levels reduced litigation risks and accelerated dispute resolution. 

Looking forward, the name matching engine's architecture enables expansion to additional use cases including beneficiary identification for government schemes, duplicate ration card detection, voter registration reconciliation, and cross-departmental citizen data consolidation—all critical capabilities for effective urban governance in rapidly growing Indian cities. 

Testimonials

"JupiterBrains delivered what we thought was impossible—a 98%+ accurate name matching engine for Kannada and English that works fully on-premise within our data center. The system identified 300,000 properties we couldn't match manually, transforming our e-Khata processing from weeks to milliseconds. The phonetic encoding algorithms understand Karnataka naming conventions better than any generic AI solution could. This has been transformative for Bengaluru's property tax administration."

Senior Revenue Officer, BBMP (Bruhat Bengaluru Mahanagara Palike)

Found This Insightful?

If you'd like to discuss this topic further, drop your details and we'll connect with you.

Keep Exploring

Here's another post you might find useful