Breaking 08:20 Euro falls below 1.1500 as traders await Fed and ECB decisions 07:50 Russian oil revenues surge after US sanctions waiver boosts exports 07:20 Goldman says fuel prices hit harder than crude in oil shock 07:00 Oil prices fall as US stock surge and Iraq export deal ease concerns 23:50 Amazon to drastically reduce packages sent via U.S. Postal Service 23:40 Four suspects arrested in Sidi Bouknadel drug case, including police officer 23:20 UAE Central Bank launches resilience measures amid Iran crisis 23:00 Asbestos-contaminated toys: Stretch Monster figures pulled from King Jouet stores 22:40 Ali Larijani, influential Iranian powerbroker, dies at 67 22:20 US aircraft carrier Ford to make temporary port call after onboard fire 22:03 Morocco awarded 2025 Africa Cup of Nations title after Senegal forfeit 22:00 Lululemon forecasts lower revenue and profit amid board turmoil and competition 21:40 European Commission warns of legal gap as child abuse image detection system nears expiration 21:20 US regulator releases long-awaited guidance on cryptocurrency classification 21:00 Cubans call for dialogue with the United States amid rising tensions 20:40 Unilever considers separation of its food assets 20:20 Chile begins construction of anti-migration trenches at Peru border 20:00 Nvidia sees over $1 trillion opportunity for Blackwell and Rubin AI chips by 2027 19:40 US says Iran conflict not delaying weapons shipments to Taiwan 19:20 Denis Sassou Nguesso re-elected in Congo with 94.82% of vote, provisional results show 19:02 Peru’s prime minister resigns weeks before presidential election 16:15 Turkey ready to host next round of Russia-Ukraine talks 16:00 AI coding tools show reliability gaps in structured output tasks 15:45 Sri Lanka approves emergency fuel purchases amid supply concerns 15:30 Alleged Israeli spy firm visit fuels tensions ahead of Slovenian elections 15:20 Webb telescope detects thickest atmospheric haze ever seen on exoplanet 15:15 Mexico open to hosting Iran’s World Cup matches amid safety concerns 15:00 Syrian authorities impose alcohol ban in Damascus 14:50 Neste shares gain as banks raise targets on fuel price surge 14:45 Bank Al-Maghrib forecasts 5.6% economic growth for Morocco in 2026 14:34 Taghazout Bay hosts pivotal WSL qualifier for Europe and Africa 14:30 Moroccan footballer Neil El Aynaoui and family victims of violent home robbery in Rome 14:20 UAE briefly closes airspace as Iran Israel strikes escalate across region 14:17 Trump vows to “take” Cuba as island reels from oil embargo 14:15 French Rugby Federation hit by cyberattack affecting 530,000 members 14:03 Republicans enact new maps in four states amid redistricting push 14:00 Fuel prices continue to rise in Lebanon amid regional tensions 13:50 Oil prices rebound above $100 as Hormuz concerns persist 13:45 Hiroshima survivor Shigeaki Mori dies at 88 13:40 Nearly 470 million children live in areas of armed conflict 13:34 Morocco bets on AI to strengthen agriculture and growth 13:30 Spain approves release of oil reserves to ease supply disruptions 13:20 Solana climbs above $90 as ETF inflows and short squeeze drive rally 13:15 Man sentenced in Morocco for spreading false information in Soundous case 13:00 Kabul hit by deadly airstrike as tensions escalate between Afghanistan and Pakistan 12:54 Superstition remains widespread in Morocco despite rising skepticism 12:50 Nvidia DLSS 5 reveal sparks backlash over AI generated visuals 12:45 Bank of England unveils plan to strengthen bank liquidity in crises 12:39 Dell launches first desktop powered by Nvidia GB300 AI superchip 12:31 Lebanon condemns alleged Hezbollah sabotage plot in Kuwait 12:29 Love Brand 2025 | Royal Air Maroc among the favorite brands of consumers in Morocco 12:20 Canadian duo wins best production design Oscar for Frankenstein 12:15 Moroccans secure nominations at the Africa Golden Awards 2026 12:00 Hyundai recalls 68,500 vehicles after fatal incident linked to power seats 11:50 Jessie Buckley becomes first Irish actress to win best actress Oscar 11:20 Kpop Demon Hunters wins two Oscars in milestone night for K-pop 11:15 ONCF expands train service for Eid al Fitr travel surge 11:01 Adopt enters Morocco with three March store openings 10:50 Nvidia unveils DLSS 5 and space AI chip at GTC 2026 10:20 Asian stocks rise on AI rally as oil climbs above $102 09:50 Zambia rejects US aid deal tying health funding to mining access

AI coding tools show reliability gaps in structured output tasks

Yesterday 16:00
AI coding tools show reliability gaps in structured output tasks

A new study from the University of Waterloo finds that leading artificial intelligence coding tools still fail in roughly one out of four cases when generating structured outputs, raising concerns about their reliability in real-world software development workflows.

The research, released on March 16 and scheduled for presentation at the International Conference on Learning Representations 2026, evaluated 11 large language models across 18 structured output formats and 44 tasks. Even the best-performing proprietary systems reached only about 75 percent accuracy, while top open source models achieved close to 67 percent.

Structured output remains a critical weak point

The study, titled “StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs,” focused on formats commonly used in development pipelines, including JSON, YAML, CSV, HTML, React and SVG. These formats are essential for integrating AI-generated code into production systems.

Researchers assessed model outputs using a combination of syntax validation, keyword matching and visual question answering. The results showed that while models performed reasonably well on text-based tasks such as documentation and simple data structures, they struggled with more complex outputs.

Tasks involving visual or layout elements, including image generation, video content, dynamic web design and diagram code, produced the highest error rates. The study also found that generation tasks, where models convert natural language instructions into structured formats, were significantly more difficult than conversion tasks between existing formats.

Human oversight remains essential

The research team included Dongfu Jiang, Jialin Yang and Wenhu Chen, supported by a group of 17 contributors involved in annotation and evaluation. According to Jiang, the study measured both syntactic correctness and whether outputs meaningfully addressed the task.

He noted that despite rapid advances, AI coding systems still require close human supervision. Developers using these tools cannot rely solely on automated outputs, particularly in environments where precision is critical.

Chen emphasized the collaborative research model at Waterloo, where students contribute to and lead benchmarking efforts, reflecting a broader trend in AI development that combines experimentation with evaluation.

Widespread adoption meets practical limitations

The findings come at a time when AI-assisted coding tools have become deeply embedded in software engineering workflows. A recent survey by The Pragmatic Engineer indicates that 95 percent of respondents use AI tools at least weekly, and 75 percent rely on them for at least half of their engineering tasks.

Platforms such as GitHub Copilot, Claude Code and Cursor are now standard in many development environments. However, the Waterloo study highlights a key risk: errors in structured outputs may not always be immediately visible, increasing the likelihood of hidden bugs or configuration issues.

In complex systems, such issues can propagate and lead to broader failures, making validation and review processes more important than ever.

The study has been published in Transactions on Machine Learning Research and contributes to ongoing discussions about the role of large language models in production-grade software development.


  • Fajr
  • Sunrise
  • Dhuhr
  • Asr
  • Maghrib
  • Isha

This website, walaw.press, uses cookies to provide you with a good browsing experience and to continuously improve our services. By continuing to browse this site, you agree to the use of these cookies.