Evaluation Archives - Mondo NewsMondo News

European officials have initiated an investigation into four adult websites suspected of inadequately preventing minors from viewing adult content.

Following a review of the companies’ policies, the European Commission criticized PornHub, StripChat, XNXX, and XVideos for not implementing adequate age verification procedures to block minors from accessing their sites.

This inquiry has been launched in accordance with the EU’s Digital Services Act (DSA), a comprehensive set of regulations aimed at curbing online harm such as disinformation, cyber threats, hate speech, and counterfeit merchandise. The DSA also enforces stringent measures to safeguard children online, including preventing mental health repercussions from exposure to adult materials.

The committee noted that all four platforms employed a simple one-click self-certification for age verification.

“Today marks a significant step toward child protection online in the EU, as the enforcement action we are initiating… clearly indicates our commitment to hold four major adult content platforms accountable for effectively safeguarding minors under the DSA.”

While no specific deadline has been set for concluding the investigation, officials stressed that they aim to act swiftly on potential next steps based on the platforms’ responses.

The platforms can resolve the investigation by implementing an age verification system recognized as effective by EU regulators. Failure to comply could result in fines of up to 6% of their global annual revenue.

The DSA regulates platforms with over 45 million users, including Google, Meta, and X, while national authorities in each of the 27 member states are responsible for those that fall beneath this threshold.

On Tuesday, the committee announced that StripChat no longer qualifies as a “very large online platform.” Following the company’s appeal, its oversight will now be handled by Cyprus rather than Brussels, under its parent company, Techinius Ltd.

However, this new designation will not take effect until September, meaning that the investigation into age verification remains active.

The child protection responsibilities of StripChat will continue unchanged.

Aylo FreeSites, the parent company of Pornhub, is aware of the ongoing investigation and has stated its “full commitment” to ensuring the online safety of minors.

“We are in full compliance with the law,” the company remarked. “We believe the effective way to protect both minors and adults is to verify user age at the point of access through their device, ensuring that websites provide or restrict access to age-sensitive content based on that verification.”

Techinius has been approached for comments. A Brussels-based attorney, recently representing the parent company of XVideos (Web Group Czech Republic) and XNXX (NKL Associates) in EU legal matters, has also reached out for statements.

Source: www.theguardian.com

The ARC-AGI-2 benchmark is designed to be a difficult test for AI models

Just_Super/Getty Images

SEI 245115646 — The ARC-AGI-2 benchmark is designed to be a difficult test for AI models

Just_Super/Getty Images

The most sophisticated AI models present today are inadequate scores on new benchmarks designed to measure progress towards artificial general information (AGI), and brute-force computing power is not sufficient to improve as evaluators consider the cost of running the model.

There are many competing definitions of AGI, but it is generally thought to refer to AI capable of performing cognitive tasks that humans can do. To measure this, the ARC Awards Foundation previously began a test of reasoning ability called ARC-AGI-1. Last December, Openai announced that the O3 model scored highly in tests, with some asking if the company is approaching AGI achievement.

But now the new test, the ARC-AGI-2, has raised the bar. Although current AI systems on the market are difficult enough to not achieve a score of over 100 digits of 100 in tests, all questions have been answered by at least two people on less than two attempts.

in Blog post Introducing the ARC-AGI-2, ARC president Greg Kamradt said a new benchmark is needed to test skills that differ from previous iterations. “To beat it, you need to demonstrate both high levels of adaptability and high efficiency,” he writes.

The ARC-AGI-2 benchmark differs from other AI benchmark tests in that it focuses on the ability to match the world’s leading PHD performance, but on the ability to complete simple tasks, such as replicating new image changes based on past examples of iconic interpretations. The current model is superior to “deep learning” measured by ARC-AGI-1, but not so good for seemingly simple tasks that require more challenging thinking and interaction with ARC-AGI-2. For example, Openai’s O3-low model won 75.7% on the ARC-AGI-1, but only 4% on the ARC-AGI-2.

This benchmark also adds a new dimension to measure AI capabilities by examining the efficiency of problem solving, as measured at the cost required to complete the task. For example, ARC paid a human tester $17 per task, while O3-low estimates that it would cost $200 for the same task.

“I think ARC-AGI’s new iteration, which now focuses on balancing performance and efficiency, is a major step towards a more realistic evaluation of the AI model,” he says. Joseph Imperial At the University of Bath, UK. “This is a sign that we are moving from a one-dimensional evaluation test that is not only focusing on performance, but also considering a decline in computing power.”

Models that can pass the ARC-AGI-2 should not only be very capable, but also be smaller and lighter, Imperial says. Model efficiency is a key component of the new benchmark. This helps address concerns that AI models are becoming more energy-intensive – Sometimes to the point of waste – to achieve much better results.

However, not everyone is convinced that the new measure will be beneficial. “The whole framing of this to test intelligence is not the correct framing.” Catherine Frick At Staffordshire University, UK. Instead, these benchmarks are extrapolated to imply general functionality across a set of tasks, simply by assessing the ability of AI to properly complete a single task or a set of tasks.

Working well with these benchmarks should not be seen as a major moment for AGI, Flick said:

And another question is what will happen if ARC-AGI-2 is given, or when it is given. Do you need yet another benchmark? “If they develop ARC-AGI-3, I guess they’ll add another axis to the graph [the] The minimum number of humans – whether expert or not, it will take a task to solve, in addition to performance and efficiency,” says Imperial. In other words, discussions about AGI rarely resolve immediately.

topic:

Source: www.newscientist.com

Mondo News

All of the latest tech and science news from all over the world.

Tag Archives: Evaluation

Pornhub and Three Other Adult Websites Undergo EU Child Safety Evaluation

ARC-AGI-2: Breakdown of Leading AI Model in Latest Artificial General Information Evaluation