Breaking 16:20 OpenAI launches a $10 billion joint venture to embed AI in private equity firms 16:00 Wildfires spread across the Northern Hemisphere weeks ahead of schedule 15:38 Iranian supertanker carrying $220 million in crude breaks through US naval blockade 15:20 Bitcoin stalls near $78,000 as Binance logs five days of stablecoin outflows 14:55 Germany maps US potash dependency as potential lever in trade standoff 14:37 Oil shock and Wall Street euphoria put global economy on recession watch 14:21 Chicken and eggs trigger turbinate swelling in food-sensitive patients 14:00 Turbinate hypertrophy: causes, symptoms and evidence based treatments 13:42 US backs Lai after surprise Eswatini visit draws sharp rebuke from Beijing 13:20 Dubai airport traffic collapses 66 percent in March as regional war disrupts Gulf aviation 13:03 Rockstar Games developers allege unpaid overtime amid GTA 6 crunch at India studio 12:03 The Week’s XI: A historic royal momentum strengthening the foundations of defense and stability 11:45 Fifa faces world cup broadcast crisis as India and China deals remain uncertain 11:30 Norwegian cruise cuts profit forecast as Middle East conflict raises fuel costs and weakens demand 11:21 Jet fuel crisis grounds airlines worldwide as Spirit Airlines shuts down operations 11:15 EU recommends member states to avoid using Huawei and ZTE in connectivity infrastructure 11:00 Pakistan facilitates return of Iranian cargo ship crew seized by the United States 10:59 Astronomers detect 27 planet candidates orbiting binary star systems using new method 10:45 South Africa reports critical British passenger after Hantavirus outbreak on antarctic cruise 10:42 inter milan scudetto 2026, serie a champion 2026, inter milan title, lautaro martinez serie a, inter milan parma 10:30 New Mexico seeks changes to Meta platforms in youth harm trial 10:28 EasyJet launches direct Strasbourg to Marrakech flights twice weekly 10:21 Morocco's "Lalla Al Moutaâouina" program scales up support for women's cooperatives 10:15 Denmark’s FSA requests police probe into Nordea’s anti-money laundering compliance 10:04 United Airlines Boeing 767 strikes lamppost and truck while landing at Newark airport 10:00 Exide Industries reports strong quarterly profit growth on rising auto demand 09:45 Italy evacuates around 3,500 people as wildfire spreads in Tuscany 09:42 UNICEF Morocco 2025 report shows major gains for children across key sectors 09:30 AI chipmaker Cerebras targets strong valuation in US IPO push 09:27 Morocco pitches investment potential to Manchester business leaders amid UK ties 09:15 Dubai Airports scales up operations as UAE airspace fully reopens 09:04 Chanel Cruise 2026/27 backstage beauty looks reveal key makeup trends 09:00 Austria expels Russian diplomats over suspected espionage activities 08:45 Magnitude 6 earthquake strikes Samar in the Philippines, aftershocks feared 08:40 Timeless Festival secures royal patronage and expands cultural ambition 08:30 India resumes wheat exports but high prices may curb demand 08:23 Morocco’s natural gas imports drop 15% in 2026’s first quarter 08:15 German carmakers hit by new US tariff increase 08:00 SK Hynix shares surge on strong AI data center investment outlook 08:00 The Kremlin tightens security around Putin amid fears of internal coup 07:45 Swiss lawmakers push for swift decision on UBS capital rules 07:42 Apple tests a streamlined Modular dial for watchOS 27 07:30 France expects inflation to return to target by 2027–2028 07:25 Egyptian music icon Hany Shaker passes away in Paris after long illness 07:15 Singapore and New Zealand sign pact to secure supply chains during crises 07:01 Japan and Australia deepen energy and critical minerals cooperation amid oil crisis 07:00 UK Greens challenge Labour strongholds in London elections

AI coding tools show reliability gaps in structured output tasks

Tuesday 17 March 2026 - 16:00
AI coding tools show reliability gaps in structured output tasks

A new study from the University of Waterloo finds that leading artificial intelligence coding tools still fail in roughly one out of four cases when generating structured outputs, raising concerns about their reliability in real-world software development workflows.

The research, released on March 16 and scheduled for presentation at the International Conference on Learning Representations 2026, evaluated 11 large language models across 18 structured output formats and 44 tasks. Even the best-performing proprietary systems reached only about 75 percent accuracy, while top open source models achieved close to 67 percent.

Structured output remains a critical weak point

The study, titled “StructEval: Benchmarking LLMs’ Capabilities to Generate Structural Outputs,” focused on formats commonly used in development pipelines, including JSON, YAML, CSV, HTML, React and SVG. These formats are essential for integrating AI-generated code into production systems.

Researchers assessed model outputs using a combination of syntax validation, keyword matching and visual question answering. The results showed that while models performed reasonably well on text-based tasks such as documentation and simple data structures, they struggled with more complex outputs.

Tasks involving visual or layout elements, including image generation, video content, dynamic web design and diagram code, produced the highest error rates. The study also found that generation tasks, where models convert natural language instructions into structured formats, were significantly more difficult than conversion tasks between existing formats.

Human oversight remains essential

The research team included Dongfu Jiang, Jialin Yang and Wenhu Chen, supported by a group of 17 contributors involved in annotation and evaluation. According to Jiang, the study measured both syntactic correctness and whether outputs meaningfully addressed the task.

He noted that despite rapid advances, AI coding systems still require close human supervision. Developers using these tools cannot rely solely on automated outputs, particularly in environments where precision is critical.

Chen emphasized the collaborative research model at Waterloo, where students contribute to and lead benchmarking efforts, reflecting a broader trend in AI development that combines experimentation with evaluation.

Widespread adoption meets practical limitations

The findings come at a time when AI-assisted coding tools have become deeply embedded in software engineering workflows. A recent survey by The Pragmatic Engineer indicates that 95 percent of respondents use AI tools at least weekly, and 75 percent rely on them for at least half of their engineering tasks.

Platforms such as GitHub Copilot, Claude Code and Cursor are now standard in many development environments. However, the Waterloo study highlights a key risk: errors in structured outputs may not always be immediately visible, increasing the likelihood of hidden bugs or configuration issues.

In complex systems, such issues can propagate and lead to broader failures, making validation and review processes more important than ever.

The study has been published in Transactions on Machine Learning Research and contributes to ongoing discussions about the role of large language models in production-grade software development.


  • Fajr
  • Sunrise
  • Dhuhr
  • Asr
  • Maghrib
  • Isha

Read more

This website, walaw.press, uses cookies to provide you with a good browsing experience and to continuously improve our services. By continuing to browse this site, you agree to the use of these cookies.