Seetha Mahalaxmi Healthcare (SML) in partnership with the IIT Bombay-led BharatGPT ecosystem has unveiled ‘Hanooman’, a series of large language models trained on 22 Indian languages and backed by Reliance. With up to 40 billion parameters, Hanooman targets ChatGPT-style Indic AI services.
What is Hanooman
Hanooman is a suite of large language models across varying sizes up to 40 billion parameters, trained on texts in 22 constitutionally recognised Indian languages. The models have been developed by Pune-based Seetha Mahalaxmi Healthcare along with IIT Bombay-helmed BharatGPT research consortium.
Hanooman’s Scale
The Hanooman series comprises multiple foundation models starting from 1.5 billion parameters to 40 billion parameters. The four smaller models in the series will be open-sourced next month. The parameter count signals the models’ ability to understand languages based on volumes of text consumed.
Multilingual Prowess
While still a work in progress, Hanooman can currently comprehend and respond in 11 regional languages so far, including Hindi, Marathi, Tamil, Telugu and Malayalam. Efforts are ongoing to extend its interactive abilities across all 22 tongues with more data.
BharatGPT Ecosystem’s Support
Hanooman has the backing of BharatGPT – an initiative nurtured by IIT Bombay along with seven other IITs, the Department of Science and Technology, SML and Reliance Jio – to advance India-centric AI relying on local language datasets.
Multimodal Strengths
Hanooman holds multimodal AI capabilities for generating text, speech, video and cross-format content. This allows richer human-computer interaction spanning text, visual and voice inputs/outputs in vernacular languages.
Enterprise, Consumer Applications
SML is engaging banking, healthcare and mobile app enterprises to employ Hanooman for focused models-as-a-service or create tailored vertical solutions via fine-tuning. Consumers may get access through Reliance’s upcoming ChatGPT rival app.
Sourcing Language Data Still Challenging
Experts involved in Hanooman stress sourcing Indian language datasets with sufficient quality and volume remains the foremost hurdle in reaching full Indic AI potential and usage scale. But model launches display promising start.
As per reports, the launch is expected in March.