For the 90s kid, AI and its first encounter is “Siri”, a AI-powered voice assistant which was launched along with the iPhone 4S’ features in 2011.
The Evolution of Siri and Apple’s Gen AI
This is also when one had first-hand experience of iPhone, which many of 90s kid did not necessarily had.
Siri, made life easy, by executing many tasks ranging from setting up a call to setting up an alarm.
It was new and fun and made life a tad easy if not a lot easier.
However, in the recent years there has not been much movement in “Siri”. In the interim, world witnessed AI explosion and many fields and almost all companies are planning to integrate it into their business processes.
With the AI explosion and adoption, it is getting reported that our own “Siri” would also undergo this transformation and enter the AI pipeline, only the come out smarter from the other end.
Apple and its work on Gen AI for Siri has been doing rounds in the technology sector for some time.
The Rise of Ferret-UI and its Impact on Mobile AI
As per a new research paper published by Cornell University, there is a new MLLM (Multimodal Large Language Model) which understands how a phone’s interface works.
Titled as “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs”, the paper throws light on how despite coming a long way, the technology still not is fully successful when it comes to user interface interactions with screens.
Launched in October 2023, Ferret UI is an MLLM which is being developed with the aim to understand the UI screens as well as to understand how the apps in phone work. This model, paper claims to be having “referring, grounding, and reasoning capabilities.”
Ferret is an open-source, MLLM that was released between Apple and Cornell University, resulting from extensive research on how large language models could recognise and understand elements within pictures. This implies that any UI which has Ferret running underneath can handle queries like a ChatGPT or Gemini does.
There are diverse aspect ratios and compact visual elements found in smartphone displays, and the same is one of the primary challenging when it comes to enhancing the AI understanding of app screen.
What the model does is, it magnifies the details and leverages the enhanced visual features to understand even the smallest icons as well as buttons. The paper further explains that when it comes to ability to understand and interact with app interfaces, Ferret UI has far surpassed the contemporary models.
This digital assistant, is touted to execute complex tasks within these apps in future. What it would mean in future is that Siri could now undertake complex tasks such as booking a flight or making a reservation since the app would interact with the app.