Researchers Train Large Language Models to Identify How Virus Evolves

The research aims to uncover the evolution of SARS-CoV-2 variants via LLMs.

Finding the progression of a virus variant comes differently depending on the strain. However, with the help of artificial intelligence, a group of researchers from Argonne Laboratory of the US Department of Energy (DOE) has finally developed a new technique to track the evolution of a virus.

Currently, they are now working to identify how SARS-CoV-2 evolves so they would end up discovering more about the future variants of this virus.

Genomic Sequence to Track COVID-19 Variants

Researchers Train Large Language Models to Identify How Virus Evolves
Fusion Medical Animation from Unsplash

According to Newswise, scientists from DOE's Argonne Lab were hailed as the victors for the 2022 Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research.

Their new paper together with other collaborators aims to know how genomic sequences help in determining COVID-19 variants.

Through AI, the researchers developed large language models or LLMs which work by predicting the next words to come out on the first training stage.

The language datasets also contribute to the learning process by translating the English texts to Spanish. With that being said, the AI-powered tools aid in fast-tracking LLMs to understand the mutation of a virus into a deadly variant.

The evolution of a virus starts with merely just becoming a more dangerous counterpart of itself. Scientists used to label the deadly variants under the VOC category or variant of concern.

This method helps them to comprehend what's in store for the variant so they could create new ways of treatment, as well as new vaccines that will combat the virus.

"When the pandemic began, we had several of these really harmful variants of the virus, like the Delta variant," Argonne's computational biologist Arvind Ramanathan said.

Their paper entitled "GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics" involves a trained model that will track VOCs and the associated genes.

Furthermore, the scientists trained on data obtained from the COVID-19 period which aims to inform health organizations and officials regarding an effective tool to identify VOCs.

GenSLM is quite similar to VOC identification but the difference is that it relies on a genome-scale foundation model.

How Large Language Models Help Scientists

Previously, the experts only knew that LLMs are capable of identifying how protein evolves within its structure. This time, Ramanathan and the other scientists used this model at a gene level for the first time.

According to the study's co-author Venkatram Vishwanath, the large language models are helpful in "achieving the AI for science vision across diverse science domains."

The early tests proved that tracking the progression of a protein is effective with LLMs. Later, they found out that they can also track VOCs using the same models, per Medical Xpress.

The success of the Argonne Laboratory's study on language models is also attributed to supercomputing assets. The tasks were not only efficiently distributed via AI. More powerful tools also took part in the study including Selene and Polaris, two supercomputers which benefit from GPUs.

Joseph Henry
Tech Times
ⓒ 2024 TECHTIMES.com All rights reserved. Do not reproduce without permission.
Join the Discussion
Real Time Analytics