Disaster can occur when training one chatbot with another
Photo/Getty Images
Recent whistleblower revelations indicate that some individuals hired to conduct high-quality conversations for training AI models are resorting to fraud by utilizing chatbots like ChatGPT. This alarming trend, highlighted by New Scientist, poses a significant threat to the integrity of AI development, potentially resulting in a “breakdown” of sophisticated models.
Currently, most AI models are trained on vast amounts of data sourced from the internet. As the demand for training data escalates, AI firms are employing individuals to interact with and test AI systems in hopes of enhancing the efficacy of future large-scale language models (LLMs).
These workers are often contracted through third-party agencies and typically earn low wages without stable employment. An employee, Alice*, reveals that such work conditions encourage unethical practices, such as relying on chatbots to expedite tasks, despite clear company policies against it.
“It’s very prevalent. Every organization I’ve been a part of has strict guidelines and attempts to monitor compliance. However, stopping it completely seems unlikely,” Alice explains.
Alice expresses no remorse for utilizing ChatGPT to finish training tasks, stating, “As long as you guide the chatbot to avoid recognizable AI signatures, escaping detection is easy. The less careful users are the ones caught.”
“If companies desire high-quality data, they must offer fair contracts,” Alice asserts. “Instead, they exploit struggling individuals, retaining them until project completion and then abruptly terminating their contracts.”
Bob*, another employee working with a training platform called Outlier, initially utilized AI for training before being promoted to a leadership position tasked with monitoring similar behaviors.
“Management oscillates between mild tolerance and strict prohibition,” Bob recounts. Employees at Outlier are monitored via Hubstaff, which captures desktop screenshots at random intervals to confirm adherence to task requirements.
“You can often see AI tools like ChatGPT on the taskbar, either minimized or open in another tab,” Bob says, indicating widespread AI utilization.
Outlier, owned by Scale AI, has not responded to requests for comment, although Scale AI claims to collaborate with tech giants including Meta and Cisco, who have also remained silent. Bob mentions he worked on projects for Google, but they, too, did not respond.
Carol*, another employee with experience across various platforms, admits her initial use of AI was to check for task guideline violations, fearing expulsion and loss of income.
“Initially concerned about my income source, I found it easier to accomplish tasks via the LLM,” Carol states. “Many of my current projects involve scenario creation, so I employ one LLM to devise the scenarios and another for generating the corresponding files.”
“I’m worried that this practice undermines AI quality,” she adds, expressing concern about training models with AI-generated content.
Mark Lee, a researcher at the University of Birmingham, UK, warns that training AI on AI-generated content can lead to “cannibalism” of models, ultimately diminishing their capability. “While this worst-case scenario may not always happen, the misconduct reflected in these practices undoubtedly hampers performance,” Lee states.
He concludes, “A human data presence, even at 10%, can significantly mitigate these issues, ensuring the models do not falter.” The implications of these unethical practices highlight fundamental weaknesses in AI performance, as the technology struggles to mimic human-like ingenuity effectively.
*Names have been changed to protect personal identities.
Topics:
Source: www.newscientist.com











