|
AI trends, updates and resources to power your practice.
|
 |
|
Reality caught up with AI
Autumn leaves may be falling, but compliance stakes are only rising. Year-end crunch time is here, and with tax deadlines looming, our expectations of AI are not about what it can do – it's about what it must do. The real challenge now is turning AI from experiment into an enterprise-safe tool.
This month, explore how the accounting profession is grounding innovation in accountability, balancing human judgment with digital precision and ensuring that every new system stands up to scrutiny.
For deeper insights on putting AI to work – safely, strategically and with confidence – explore CPA.com's multi-part AI initiative.
What's in focus this month
- Tax code meets its match
- Accountability meets acceleration
- The AI confidence trap
- The dark side of technology
|
| |
|
|
What's new
A new benchmark, TaxCalcBench, was introduced to formally evaluate the ability of frontier AI models to calculate U.S. personal income taxes. The results are stark: even state-of-the-art models like Gemini 2.5 Pro and Claude Opus 4 cannot perform the task reliably, correctly calculating less than one-third of the federal tax returns in a simplified test set.
|
How it works
TaxCalcBench uses a dataset of 51 distinct federal tax scenarios. For each scenario, a model is provided with a complete set of user inputs in a JSON format and is prompted to calculate the tax return. Its output is then rigorously compared, line by line, against the correct results generated by a traditional, deterministic tax engine. The evaluation measures strict accuracy, where every line must match perfectly and also a lenient accuracy that allows for a +/- $5 variance.
|
Behind the news
Tax calculation has historically been the domain of highly specialized, deterministic software “engines.” Translating the 75,000+ pages of U.S. tax code into flawless code is a monumental challenge that few have conquered. While AI demos have hinted at tax capabilities, TaxCalcBench is the first academic-style benchmark to move beyond hype and formally test performance against the ground truth of a professional tax engine.
|
Why it matters
For accountants, CPAs and finance professionals, these findings are a critical reality check on the present capabilities of AI. The study confirms that the nuanced, high-stakes work of tax compliance is not yet automatable by general-purpose large language models (LLMs). The models' primary failure points – improperly using tax tables, making fundamental calculation errors and failing to determine eligibility for common credits – underscore the continued necessity of professional human oversight from a trusted advisor and the value of certified, deterministic software. The accuracy demanded by the IRS is still well beyond the reach of today's AI.
|
Our thinking
The takeaway isn't that LLMs are irrelevant, but that their role is one of augmentation, not replacement. The path forward will likely involve hybrid systems where AI assists with initial data interpretation and sub-tasks, but critical calculations and final validation remain the purview of deterministic engines and human experts. For the finance industry, the focus should be on building the scaffolding and orchestration required to leverage AI's strengths while containing its weaknesses, ensuring that a human professional remains firmly in the loop.
|
|
|
Accountable acceleration: Gen AI fast-tracks into the enterprise
|
|
|
What's new
The 2025 Wharton-GBK AI Adoption Report confirms GenAI has entered mainstream enterprise use, with 82% of leaders using it weekly and 46% daily. ROI discipline is now standard: 72% track GenAI returns and ~75% report positive results. Data-intensive functions such as finance are emerging as early ROI leaders, signaling a shift from experimentation to accountable AI deployment.
|
How it works
The study surveyed more than 800 senior leaders across U.S. enterprises (>1,000 employees; >$50M revenue). Top use cases center on analytics, forecasting and document generation. Roughly one-third of GenAI budgets are now devoted to internal R&D, showing companies are moving beyond off-the-shelf tools toward tailored solutions. ROI tracking focuses on productivity, profitability and operational throughput.
|
Behind the news
This is the third year of Wharton-GBK's enterprise AI study, marking a clear acceleration from 37% weekly use in 2023 to 72% in 2024 and 82% in 2025. Nearly nine of ten (88%) of companies expect to increase GenAI budgets in the next year, and 62% anticipate >10% growth over the next 2–5 years. Chief AI Officers now appear in 60% of organizations, with many expanding responsibilities rather than creating new standalone roles.
|
Why it matters
The narrative for accounting and finance is clear: GenAI is becoming a productivity multiplier, not a replacement. Eighty-nine percent of leaders believe AI enhances skills, even as 43% worry about over-reliance diminishing human expertise. Finance teams are applying GenAI in workflows like data analysis, planning support and document automation, while maintaining judgment and oversight over high-stakes decisions. This positions finance to help shape AI governance and ROI frameworks company-wide.
|
Our thinking
Finance's dual mandate, adopting AI to drive efficiency while governing its impact on value creation, puts the function at the center of the AI shift. To stay ahead, finance leaders should build AI fluency, reinforce human-judgment skills, and lean-in to strategic roles in risk, ethics and ROI accountability. AI may automate tasks, but finance teams who orchestrate intelligence, not just produce it, will set the pace for enterprise performance.
|
|
|
The AI mirror: Competence in the age of co-pilots
|
|
|
What's new
A recent study on human-AI interaction has uncovered a startling cognitive distortion: When using AI, everyone overestimates their performance. This “Reverse Dunning-Kruger” effect shows that even AI-savvy experts become more overconfident, flipping the traditional script on self-assessment.
|
How it works
Researchers observed participants using AI for problem-solving tasks and in this study found that AI acts as a great equalizer of confidence. Both novices and experts, when aided by AI, believed their performance was significantly better than it actually was, with the most skilled users showing the largest leap in overconfidence.
|
Behind the news
This stands in stark contrast to the well-documented Dunning-Kruger effect, a cognitive bias where people with low ability at a task overestimate their ability, and experts tend to underestimate theirs. The introduction of an AI collaborator seems to shatter this pattern, creating a new blind spot where the AI's perceived competence is conflated with our own.
|
Why it matters
For the accounting and finance profession, this is a red flag. As we integrate AI co-pilots and generative tools into our workflows for financial modeling, tax analysis and auditing, we risk becoming dangerously overconfident in the outputs. The AI's fluency can mask subtle but critical errors, and our own expertise, our primary defense, is being psychologically undermined. An unchecked assumption that the AI is “right” could lead to flawed financial strategies, inaccurate reporting and a fundamental erosion of professional judgment.
|
Our thinking
The future of financial expertise isn't about letting the AI take the wheel; it's about learning to be a better driver with a powerful navigation system. This new cognitive trap demands a radical form of vigilance. We must cultivate a healthy skepticism, not just of the technology, but of our own judgment when using it. The most valuable skill for the accounting or finance professional of tomorrow won't be just using the AI, but knowing precisely when and how to question it – and yourself. It's time to build the mental models for critically evaluating AI-assisted work before we get lulled into a false sense of security.
|
|
|
|
|
What's new
A brief detour from our usual focus on AI, because this is too important to ignore. Nation-state hackers have found a new, brutally effective use for a technology once hailed as the future of trust: using blockchains to distribute malware. The technique, dubbed “EtherHiding,” embeds malicious code into smart contracts, making it permanent and virtually impossible to remove.
|
How it works
Attackers deploy a smart contract on a public blockchain like Ethereum, with the malware tucked inside. The blockchain's core features are then weaponized. Decentralization means there's no central server to shut down. Immutability means the malicious code can't be altered or deleted. Transactions are anonymous, shielding the attackers and retrieving the malware leaves no trace in traditional server logs. It's the ultimate form of bulletproof hosting.
|
Behind the news
Remember 2018? Every conference was buzzing with the utopian promise that blockchains would replace the internet, ushering in an era of decentralized trust. This is the risk that followed early hype. The very architectural features that were supposed to create a trustless paradise – immutability and decentralization – are now being exploited to create a permanent, untouchable haven for malicious code.
|
Why it matters
For finance and accounting professionals, this is a direct assault on the foundational concept of the “immutable ledger.” The pitch was that the blockchain was a source of ultimate truth. Now, that truth can be poisoned. This fundamentally changes the risk calculus for any engagement with blockchain technology, from cryptocurrency custody and DeFi platforms to supply chain finance. The 'feature' of immutability has become a critical security 'bug,' and due diligence on smart contracts just went from a technical check-box to a mission-critical security audit.
|
Our thinking
We were sold the dream of a trustless system. What we got is a new attack vector where the system itself protects the attacker. For leaders in accounting and finance, this is a stark lesson: every new technology brings with it a dark side. The critical skill is no longer just understanding how a technology works, but anticipating how it can be broken and misused. Your value isn't in being a cheerleader for innovation, but in being the professional skeptic who can distinguish a robust application from a ticking time bomb. The game has changed from adoption to critical, informed vigilance.
|
Did you find this month's content valuable?
|
|
| |
|
|
CPA.com
1345 Avenue of the Americas, 27th Floor
New York, NY 10105
888.777.7077 |
|
|
|
|