How AI Addiction Battles Bots Without Hoover Data’s Consent

The landscape of the internet is shifting, moving away from traditional users and towards automated web-browsing bots. A recent report indicates that, for the first time this year, non-human web browsing bots make up the majority of all traffic.

Alarmingly, over half of this bot traffic stems from malicious sources, including those harvesting unsecured personal data online. Yet, there’s a rising trend in bots designed by artificial intelligence companies, aimed at gathering data for model training and responding to user interactions. Notably, OpenAI’s ChatGPT-User accounts for 6% of total web traffic, while Claudebot, created by Anthropic, represents 13%.

AI firms argue that data scraping is crucial for keeping their models updated, while content creators voice concerns about these bots being tools for vast copyright violations. Earlier this year, Disney and Universal took legal action against AI company Midjourney, claiming that its image generators were reproducing characters from popular franchises such as Star Wars and Despicable Me.

Given that most creators lack the financial means for prolonged legal battles, many have turned to innovative methods to protect their content. They implement online tools that complicate AI bot scraping, with methods like misleading bots, causing AI to confuse images of cars with cows. While this “AI addiction” tactic helps safeguard creators’ work, it may also introduce new risks on the web.

Copyright Concerns

Historically, imitators have profited off artists’ work, which is primarily why intellectual property and copyright laws exist. The advent of AI image generators like Midjourney and OpenAI’s DALL-E has exacerbated this issue.

A key concern in the U.S. is the fair use doctrine, allowing limited usage of copyrighted materials without permission under certain circumstances. While fair use laws are designed to be flexible, they hinge on the principle of creating something new from the original work.

Many artists and advocates believe that AI technologies blur the lines between fair use and copyright infringement, harming content creators. For example, while drawing an image of Mickey Mouse in The Simpsons universe for personal use may be harmless, AI can rapidly produce and circulate similar images, complicating the transformative aspect and often leading to commercial exploitation.

In an effort to protect their commercial interests, some U.S. creators have pursued legal action, with Disney and Universal’s lawsuits against Midjourney being among the latest examples. Other notable cases include an ongoing legal dispute involving the New York Times and OpenAI regarding alleged misuse of newspaper stories.

Disney sues Midjourney over its image generator.

Photo 12/Alamy

AI companies firmly deny any wrongdoing, asserting that data scraping is permissible under the fair use doctrine. In an open letter to the US Bureau of Science and Technology Policy in March, OpenAI’s Chief Global Affairs Officer, Chris Lehane, cautioned against strict copyright regulations elsewhere in the world. Recent attempts to enhance copyright protections for creators have been critiqued for potentially stifling innovation and investment. OpenAI previously claimed it was “impossible” to develop AI models catering to user needs without referencing copyrighted work. Google shares a similar stance, emphasizing that copyright, privacy, and patent laws create barriers to accessing necessary training data.

For now, public sentiment seems to align with the activists’ viewpoint. Analysis of public feedback on copyright and AI inquiries by the U.S. Copyright Office reveals that 91% of comments expressed negative sentiments regarding AI.

The lack of public sympathy for AI firms is attributed to the overwhelming traffic their bots create, which can strain resources and may even take some websites offline—and the content creators feel powerless to stop them. While there are methods to exclude content-crawling bots, like tweaking a small file on a website to prevent bot access, these requests are sometimes ignored.

Combatting AI Data Addiction

Consequently, new tools have emerged, empowering content creators to better shield their work from AI bots. This year, CloudFlare, an internet infrastructure company known for protecting users from distributed denial-of-service (DDoS) attacks, launched technologies to combat harmful AI bots. Their approach involves generating a labyrinth of AI-generated pages filled with nonsensical content, effectively distracting AI bots from accessing genuine information.

A tool called AI Labyrinth is designed to manage 50 billion requests per day from AI crawlers, according to CloudFlare. The objective of AI Labyrinth is to “slow, confuse, and waste the resources of AI crawls and other bots that disregard the ‘no crawl’ directive.” Following this, CloudFlare introduced another tool that compels AI companies to pay for accessing their websites or restricts raw content usage.

An alternative strategy involves allowing AI bots to access online content while subtly “poisoning” it, rendering the data less useful. Tools like Glaze and Nightshade, developed at the University of Chicago, serve as a focal point of resistance. Both tools are freely available for download from the university’s website.

Since its 2022 launch, Glaze defends by introducing imperceptible pixel-level modifications, or “style cloaks,” to artists’ works, causing AI models to misidentify art styles (e.g., interpreting watercolors as oil paintings). Launched in 2023, Nightshade degrades image data in a way that leads AI models to create incorrect associations, such as linking the word “cat” with images of dogs. Both tools have been downloaded over 10 million times.

Nightshade Tool alters AI perceptions of images.

Ben Y. Zhao

Tools designed to combat AI data addiction are empowering artists, according to Ben Zhao, a senior researcher at the University of Chicago involved with both Glaze and Nightshade. “These companies have trillion-dollar market caps, and they essentially take what they want,” he asserts.

Using tools like these allows artists to exert more control over the use of their creations. “Glaze and Nightshade are interesting, innovative tools that demonstrate effective strategies that don’t rely on changing regulations,” explains Jacob Hoffman Andrews from the Electronic Frontier Foundation, a U.S.-based digital rights nonprofit.

Self-sabotaging content to deter copycats is an old strategy, notes Eleonora Rosati from Stockholm University. “For instance, cartographers might include fictitious place names, making them evidence of plagiarism if rivals replicate them. A similar tactic was noted in music, where the lyrics website Genius claimed to have embedded unique apostrophes to prove Google’s unlicensed use of their content. Google denies this claim, and the lawsuit was dismissed.

The term “sabotage” raises eyebrows, says Hoffman Andrews. “I don’t view it as disruptive; these artists are modifying their content, which they have every right to do.”

It remains uncertain how many unique measures AI firms are implementing to handle data tainted by these defensive tactics, yet Zhao’s findings indicate that 85% of these methods maintain their efficacy, suggesting AI companies may deem dealing with manipulated data more troublesome than it’s worth.

Disseminating Misinformation

Interestingly, it’s not just artists experimenting with data poisoning tactics; some nation-states might employ similar strategies to disseminate false narratives. The Atlantic Council, a U.S.-based think tank, recently revealed that the Russian Pravda News Network has attempted to manipulate AI bots to spread misinformation.

This operation reportedly involves flooding the internet with millions of web pages masquerading as legitimate news articles, aiming to boost Kremlin narratives regarding the Ukraine war. A recent analysis by NewsGuard, which monitors Pravda’s activities, found that 10 out of 10 major AI chatbots have output text aligning with Pravda’s viewpoints.

The effectiveness of these tactics emphasizes the challenges inherent in AI technology: the methods employed by well-intentioned actors can inevitably be hijacked by those with malicious intent.

However, solutions do exist, asserts Zhao, though they may not align with AI companies’ interests. Rather than arbitrarily collecting online data, AI firms could establish formal agreements with legitimate content providers to ensure their models are trained on reliable data. Yet, such arrangements come with costs, leading Zhao to remark, “Money is at the heart of this issue.”

Topics:

  • artificial intelligence/
  • chatgpt

Source: www.newscientist.com

Reddit Users Participated in AI-Driven Experiments Without Their Consent

Sure! Here’s a rewritten version of your content, preserving the HTML structure:

<div id="">
    <p>
        <figure class="ArticleImage">
            <div class="Image__Wrapper">
                <img class="Image" alt="" width="1350" height="900" src="https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg" sizes="(min-width: 1288px) 837px, (min-width: 1024px) calc(57.5vw + 55px), (min-width: 415px) calc(100vw - 40px), calc(70vw + 74px)" srcset="https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=300 300w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=400 400w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=500 500w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=600 600w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=700 700w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=800 800w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=837 837w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=900 900w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1003 1003w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1100 1100w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1200 1200w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1300 1300w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1400 1400w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1500 1500w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1600 1600w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1674 1674w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1700 1700w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1800 1800w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=1900 1900w, https://images.newscientist.com/wp-content/uploads/2025/04/29124741/SEI_249299022.jpg?width=2006 2006w" loading="eager" fetchpriority="high" data-image-context="Article" data-image-id="2478323" data-caption="The logo of the social media platform Reddit" data-credit="Artur Widak/NurPhoto via Getty Image"/>
            </div>
            <figcaption class="ArticleImageCaption">
                <div class="ArticleImageCaption__CaptionWrapper">
                    <p class="ArticleImageCaption__Title">Logo of the Social Media Platform Reddit</p>
                    <p class="ArticleImageCaption__Credit">Artur Widak/NurPhoto via Getty Images</p>
                </div>
            </figcaption>
        </figure>
    </p>
    <p>Users of Reddit unknowingly participated in AI-driven experiments conducted by scientists, raising concerns about ethical practices in such research.</p>
    <p>The platform is organized into various "subreddits," each catering to specific interests, moderated by volunteers. One notable subreddit, <a href="https://www.reddit.com/r/changemyview/">R/ChangeMyView</a>, encourages discussions on controversial topics. Recently, a moderator informed users about unauthorized experiments conducted by researchers from the University of Zurich, using the subreddit as a testing ground.</p>

    <p>The study involved inserting over 1,700 comments into the subreddit, all produced by different large-scale language models (LLMs). These comments mimicked individuals posing as trauma counselors who had experienced abuse. An <a href="https://osf.io/atcvn?view_only=dcf58026c0374c1885368c23763a2bad">explanation of the comment generation process</a> indicates that researchers instructed AI models to disregard ethical concerns, claiming users had provided consent to use their data.</p>
    <span class="js-content-prompt-opportunity"/>
    <p>A <a href="https://drive.google.com/file/d/1Eo4SHrKGPErTzL1t_QmQhfZGU27jKBjx/view">draft version</a> of the research findings revealed that AI-generated comments were found to be three to six times more persuasive than those authored by humans, based on how often they swayed opinions. The authors noted that users on <em>R/ChangeMyView</em> did not express concerns regarding AI involvement in the comments, suggesting a seamless integration of AI bots into the community.</p>
    <p>Following the revelation of the experiment, subreddit moderators raised complaints with the University of Zurich. Despite the project's prior approval from the Ethics Committee, moderators did not disclose researchers' identities but informed the community about the alleged manipulation.</p>
    <p>This experiment drew criticism from fellow academics. "At a time when criticism is prevalent, it is crucial for researchers to uphold higher standards and respect individuals' autonomy," stated <a href="https://www.hertford.ox.ac.uk/staff/carissa-veliz">Carissa Veliz</a> from Oxford University. "In this instance, the researchers fell short."</p>

    <p>Scholars must demonstrate the ethical basis of research involving human subjects to university ethics committees before proceeding, and the study received approval from the University of Zurich. Veliz has contested this decision, stating, "The study relied on manipulation and deception involving non-consenting subjects, which seems unjust. It should have been designed to prevent such misrepresentation."</p>
    <p>"While research may allow for deceit, the reasoning behind this particular case is questionable," commented <a href="https://www.linkedin.com/in/matthodgkinson">Matt Hodgkinson</a>, a member of the Council of Publication Ethics Committee, albeit in a personal capacity. "It's ironic that participants need to deceive LLMs to assert their agreement. Do chatbots have higher ethical standards than universities?"</p>
    <p>When <em>New Scientist</em> reached out to the researchers through an anonymous email provided by a subreddit moderator, they declined to comment and called for a press conference at the University of Zurich.</p>
    <p>A university spokesperson stated, "The researchers are accountable for conducting the project and publishing results," adding that the ethics committee acknowledged the experiment was "very complex" and that participants should be "informed as much as possible."</p>
    <p>The University of Zurich plans to implement a stricter review process moving forward and aims to work more closely with the community on the platform before undertaking experimental research, the spokesperson reported. The investigation remains ongoing, and researchers have opted not to publish the paper formally, as confirmed by a spokesperson who declined to identify specific officials.</p>

    <section class="ArticleTopics" data-component-name="article-topics">
        <p class="ArticleTopics__Heading">Topics:</p>
    </section>
</div>

Source: www.newscientist.com

Gambling companies caught sharing user data with Facebook without consent

The gambling company is secretly tracking visitors to its website and sending data to Facebook’s parent company without obtaining consent, a clear violation of data protection laws.

Meta, the owner of Facebook, uses this data to profile individuals as gamblers and bombard them with ads from casinos and betting sites, as reported by the observer. Hidden tracking tools embedded in many UK gambling websites extract visitor data and share it with social media companies.

According to the law, data should only be used and shared for marketing purposes with explicit permission from users on the website. However, an investigation by the observer found numerous violations across 150 gambling sites.

A call for immediate intervention was made by Ian Duncan Smith, chairman of the All-Parliamentary Group on Gambling Reform, criticizing the illegal use of tools like Metapixel without consent. Concerns were raised about the lack of regulation and accountability in the gambling industry.

Data sharing and profiling practices by gambling operators are raising concerns about targeted advertising and potential harm to individuals. The Information Committee (ICO) has taken action against companies like Sky Betting & Gaming for illegally processing personal data.

The gambling industry is under scrutiny for its marketing strategies, with calls for stricter regulations to protect consumers. Meta and other social media platforms are being called out for their role in facilitating these illegal data practices.

Concerns about the misuse of Metapixel tracking tools extend beyond the gambling industry to other sectors, prompting calls for more transparency and accountability in data collection and usage.

Source: www.theguardian.com