Grok AI Delivers Near-Instantaneous Responses, Outpacing Common Models

Elon Musk's Grok AI can now respond at "Flash-Like" speed, reportedly replying to approximately 1256.54 tokens per second while introducing new features such as support for other AI models.

Grok AI's response time is practically immediate, which GPU processors from firms such as Nvidia cannot achieve. The pace has increased from Grok's previous record of 800 tokens per second in April.

FINLAND-SCIENCE-TECHNOLOGY-AI-WIRELESS-INTERNET-COMPUTERS-SOFTWA — This illustration photograph taken in Helsinki on June 12, 2023, shows an AI (Artificial Intelligence) logo blended with four fake Twitter accounts bearing profile pictures apparently generated by Artificial Intelligence software. OLIVIER MORIN/AFP via Getty Images

Grok's site engine defaults to Meta's open-source Llama3-8b-8192 LLM, but it also supports the bigger Llama3-70b, certain Gemma (Google), and Mistral models, and more will be added shortly.

Users may now voice their questions to the Grok engine by pressing a microphone icon instead of typing them in. Grok employs the Whisper Large V3, OpenAI's most recent open-source automated speech recognition and translation model, to convert user voices into text. That text is then used as the prompt for the LLM.

The experience is critical because it demonstrates how quick and adaptive an LLM chatbot can be to programmers and non-programmers. Jonathan Ross, Grok's CEO, believes that their appeal will increase once customers see how easy it is to operate LLMs on Grok's fast engine.

For example, the sample shows what additional chores may be completed quickly, such as creating job posts or articles and updating them on the go.

Grok AI Speed Implications

Grok has received attention because it claims to be able to do AI tasks considerably quicker and more economically than competitors. It attributes this to its language processing unit (LPU), which is far more efficient than GPUs at similar tasks, in part because the LPU functions linearly.

While GPUs are vital for model training when AI applications are implemented - "inference" refers to the model's activities - they demand more efficiency and lower latency.

Like other inference providers, Grok provides developers with a console from which to construct their programs. Grok allows developers who build apps on OpenAI to migrate their apps to Grok in seconds with a few easy actions.

Grok AI Hallucinations

Grok AI's latest capabilities come after the same AI was observed hallucinating news items on X (previously Twitter). Grok AI chatbot trended in April after fabricating a false news item about Klay Thompson's vandalism and violent conduct in Sacramento, California, even though the NBA player did not do such acts.

In truth, it was about a poor game for the Golden State Warriors shooting guard, who was "shooting bricks" at 0 for 10.

This comes after last season's NBA Play-In Game, in which the Warriors traveled to Sacramento to battle the Kings for a spot in the postseason and a chance at the championship.

However, Warriors shooting guard Klay Thompson had a bad night, shooting a disheartening 0 for ten on the floor and 0 for six from the three-point line, referred to as "shooting bricks" in his 32 minutes of playtime.

Grok misinterpreted this as an act of destruction and violence, reporting on Thompson's imaginary and fabricated crime rampage in Sacramento.

Many X users took advantage of the opportunity to mock Grok and "Game 6 Klay" simultaneously, proving the false news crime and displaying Thompson's numbers for the night. In response to Grok's story, another user claimed to have been struck by a brick while presenting a photo of an injured guy.