Biotechnology Firms Seek to Develop the "ChatGPT of Biology": Does It Deliver?

Basecamp researchers gather genetic data in Malta

Greg Funnell

A British biotech firm, Basecamp Research, has spent recent years gathering extensive genetic data from microorganisms inhabiting extreme environments worldwide, uncovering 10 billion new species among over a million scientifically recognized entities. This vast database of planetary biodiversity aims to assist in training “biology chats” to address inquiries regarding life on Earth, although its effectiveness remains uncertain.

Jorg Overmann from the Leibniz Institute DSMZ, which houses one of the world’s most extensive collections of microbial cultures, asserts that while an increase in known genetic sequences is beneficial, it likely won’t lead to significant discoveries in drug development or chemistry without deeper insights into the organisms from which they originated. “In the end, I’m skeptical that a better understanding of unique features will be achieved merely through brute force in the sequencing domain,” he remarks.

Recent years have seen a surge in machine learning models aimed at identifying patterns and predicting relationships within vast biological datasets. The most well-known of these is Alphafold, which can predict the 3D structure of proteins using only genetic data, and was awarded the 2024 Nobel Prize in Chemistry at Google DeepMind.

This “genometric biology” approach has grown significantly, but according to Francis Din at the University of California, Berkeley, progress has been limited. One reason for this is the underrepresentation of biodiversity data. “Current biological models are primarily trained with datasets that favor well-studied species (e.g., E. coli, mice, humans), leading to poor prediction capabilities for traits associated with sequences from other branches of the Tree of Life,” she explains.

Basecamp researchers aim to bridge this biodiversity gap. Their expanding database now includes samples from over 120 locations across 26 countries, as detailed in a report by the company. Jonathan Finn, the company’s Chief Science Officer, notes that their sampling efforts target extreme environments that have yet to be thoroughly examined, spanning from the icy depths of the Arctic Ocean to the warm jungle hot springs. “Most of the samples we’re prioritizing are prokaryotic: bacteria, microorganisms, and their viruses,” Finn states. “We are also aware that some fungi are present.”

Genetic analyses of these samples have illuminated gene variations that are broadly shared across the Tree of Life. Based on this research, the company estimates that their data encompasses over a million species of genetic information not found in public genomic databases utilized for training AI models. This includes around 9.8 billion newly identified genes, increasing the overall known gene count tenfold, each potentially encoding useful proteins, according to the researchers.

“By providing these models with richer data, we enhance our understanding of biological mechanisms,” Finn explains. “We aim to create a ChatGPT for Biology.”

It’s estimated that Earth hosts trillions of microorganism species, many of which remain poorly characterized. Thus, it’s not unexpected that the company has identified such a wealth of novel life forms. “As we explore more, discovering diverse gene variants becomes almost inevitable,” notes Leopold Parts at the Wellcome Sanger Institute in the UK.

Nevertheless, Basecamp promotes the notion that all newly discovered materials might hold value. It’s not alone in this sentiment. “This is among the most thrilling advances I’ve encountered in quite some time,” remarks Nathan Frey, a machine learning researcher at Genentech, a US biotech firm. He emphasizes that most AI biology projects focus on algorithm improvement or generating additional lab data rather than venturing out to collect samples directly from nature.

However, skepticism arises regarding whether this database will yield the meaningful advancements the company aspires to achieve. For starters, it remains uncertain how much this newfound diversity in proteins reflects valuable new functions like enzymes and proteins that can degrade plastic useful for gene editing. “They must demonstrate that this novelty has practical utility,” cautions Parts.

Moreover, if the new genes significantly differ from known genes, Overmann expresses doubts about how easily existing tools can predict functionality or how such data can be utilized for training new models. “I can’t discern the functions of most of my genes,” he states. The company may have created a valuable new repository of biological data, but in traditional lab settings, even the most advanced AI may still face challenges in interpretation.

topic:

Source: www.newscientist.com

What's Hot

There could be an abundance of antimatter in the universe that defies current explanations

Reasons Patients Must Transition to Second-Line Obesity Medications

Vermont Enacts Legislation Requiring Fossil Fuel Companies to Compensate for Climate Change Impacts

Meta Sacrifices Billions at the Altar of AI

OpenAI Secures $200 Million Contract with US Military for “Warfighting” Initiatives

23AndMe Fined £2.3 Million by UK Regulators Over 2023 Data Breach | Technology News

Pragmata: A Unique Sci-Fi Game Making Its Comeback | Games

23andMe Founders Seek to Reclaim Control of Bankrupt DNA Testing Company

New Research Reveals the Ancient Origins of Fish Biofluorescence

Alma Identifies Molecular Activity in the Largest Known Oort Cloud Comet

Triassic Sauropodomorph Dinosaurs Endured Severe Bone Infections

The ancient creature boasted “goblin-like” teeth and a customizable tail.

UK to Brace for Summers Over 40°C in the Next Decade, Warn Officials

Sui and Atoma introduce AI capabilities to dApp developers – Blockchain Updates, Views, Videos, Opportunities

Bitcoin ETF issuer acquires 5% of BTC supply, $100 million invested in ETFSwap (ETFS) presale – Blockchain updates, insights, and career opportunities

Agora boosts Sui’s native stablecoin with addition of AUSD stablecoin to network

Meme Coin Memeinator Goes Viral, Raises $7.7 Million and Debuts on Exchanges- Latest in Blockchain News, Opinion, TV, and Job Listings

Changing the game of betting with Blockchain: New News, Opinions, TV, and Job Opportunities

Biotechnology Firms Seek to Develop the “ChatGPT of Biology”: Does It Deliver?

New Research Reveals the Ancient Origins of Fish Biofluorescence

Alma Identifies Molecular Activity in the Largest Known Oort Cloud Comet

Triassic Sauropodomorph Dinosaurs Endured Severe Bone Infections

The ancient creature boasted “goblin-like” teeth and a customizable tail.

UK to Brace for Summers Over 40°C in the Next Decade, Warn Officials

Bright Seifert Galaxy’s Ultra-Massive Black Hole Exhibits Signs of “Overeating”

Cryopreserved Seastar Larvae May Facilitate the Recovery of Key Species

Can Reusable Rockets Mitigate the Risks of Solar Geoengineering?

UK uncovers ancient cockroach fossil from 180 million years ago

White House instructs NASA to establish universal time for the moon

The Power of Salt and “Baking” in Addressing Environmental Challenges

Franks secures more capital to enhance automation of wealth services in Europe

Newly Discovered Light Properties Unveiled by Centuries-Old Theorem

Snap collaborates with edtech firm Inspirit to introduce augmented reality technology in 50 American schools

What's Hot

Biotechnology Firms Seek to Develop the “ChatGPT of Biology”: Does It Deliver?

Related Posts