Software quality collapse

Wudang · Oct 13, 2025

The Great Software Quality Collapse: How We Normalized Catastrophe

The Apple Calculator leaked 32GB of RAM.

techtrenches.substack.com

I thought of posting this to the AI thread but AI is just part of the problem.

Our research found:

AI-generated code contains 322% more security vulnerabilities

45% of all AI-generated code has exploitable flaws

Junior developers using AI cause damage 4x faster than without it

70% of hiring managers trust AI output more than junior developer code

We've created a perfect storm: tools that amplify incompetence, used by developers who can't evaluate the output, reviewed by managers who trust the machine more than their people.

But AI wasn't the cause of the big Crowstrike fubar:

Total economic damage: $10 billion minimum.
The root cause? They expected 21 fields but received 20.
One. Missing. Field.
This wasn't sophisticated. This was Computer Science 101 error handling that nobody implemented. And it passed through their entire deployment pipeline.

Sanitize your inputs.

The Great Zaganza · Oct 13, 2025

the AI boom is build on load-bearing straws

Wudang · Oct 13, 2025

Somewhere I read an article about programmers building up islands of incompetence. It mirrored my experience of several application programming teams at a major bank. The team lead became so by performing adequately at churning requirements into code while dotting the ts and crossing the is as per the process. They distrust people who know how to do it better.
I’ll keep trying to find it.

Darat · Oct 13, 2025

Was watching a video about some of OpenAI's recent announcement and in light of the "levels of abstraction" comment caught this one:

Darat · Oct 13, 2025

The whole aim of a lot of these companies is to of course get rid of expensive humans and experienced coders and developers are expensive, but if all you need is someone who can type a prompt....

Deepmind has just published this about its AI to automatically detect and fix security issues. https://deepmind.google/discover/blog/introducing-codemender-an-ai-agent-for-code-security/

Apparently this has already been used to fix 72 security fixes in open-source projects, including detecting stuff like heap buffer overflows. Interestingly Google does say that human review is required before code submission. So in the end we get back to needing expensive humans and who is going to invest in low-cost humans i.e. junior coders so that you get expensive humans i.e. experienced and good developers in 20 years time who can do the human reviewing? I think they've added in the "requires human review" as a temporary fob.

Just before I was about to hit post reply I had a thought (occasionally still happens) what a great tool this will be for malicious actors, get it to find security flaws and then use it to create exploits based on those flaws. Zero-day vulnerabilities are exploited the day software is released... <gulp>

JayUtah · Oct 13, 2025

Darat said:
The whole aim of a lot of these companies is to of course get rid of expensive humans and experienced coders and developers are expensive, but if all you need is someone who can type a prompt....

We've used AI to generate code at my company with only limited success.

Since the software systems described in the OP seem to be mostly consumer-facing commodity systems, I wager that time to market is the driving force. Back in the day you could spend a year on a 50,000 line application and update it only yearly for bug fixes. Nowadays it seems like multi-million line programs are going to market on a much faster schedule and being updated on a much faster timetable. Availability seems to be more important than correctness or performance in that sector.

Darat said:
Deepmind has just published this about its AI to automatically detect and fix security issues. https://deepmind.google/discover/blog/introducing-codemender-an-ai-agent-for-code-security/

Apparently this has already been used to fix 72 security fixes in open-source projects, including detecting stuff like heap buffer overflows.

I might want to give that a try. There are already many static and dynamic code analysis tools out there. I'd like to know how this one is different aside from having the AI moniker attached to it.

Darat said:
Interestingly Google does say that human review is required before code submission. So in the end we get back to needing expensive humans and who is going to invest in low-cost humans i.e. junior coders so that you get expensive humans i.e. experienced and good developers in 20 years time who can do the human reviewing? I think they've added in the "requires human review" as a temporary fob.

Our senior software engineers are quite experienced. Most of them are in their 40s and 50s and quite a lot of them came from the gaming industry. Their management reports that they find about twice as many bugs through manual code review than via static code analysis and QA testing. But then again our software needs are weird. We do embedded systems extensively. But then we also have supercomputers. Very little of our software is meant to run on a consumer PC.

Ironically the memory leakages reported in the OP would be considered unacceptable for our supercomputers. Yes, we have scary amounts of RAM in them, but the problems we program almost always still bump up against the limits of the hardware. If one of our custom program solutions on a supercomputer is leaking 30+ gigabytes, that's a show-stopper bug. Some of our programs have to run continuously for up to two weeks on the supercomputers, not just without crashing but also without incurring anomalous performance issues.

From my chair there is still very much a market for diligent, experienced software developers.

Darat · Oct 13, 2025

And given one of the points in the article:

Broadcom Stock Surges On OpenAI Chip Deal

Broadcom stock jumped after the chipmaker announced a deal to supply AI processors for artificial intelligence leader OpenAI.

www.investors.com

….Under their "strategic collaboration," OpenAI plans to deploy 10 gigawatts of AI accelerators designed with Broadcom. A gigawatt is a measure of power that is increasingly being used to describe the biggest clusters of AI processors…..

lauwersw · Oct 14, 2025

At my company there's also a big push for AI and most of my team members were quite skeptical (I work in a great team

). The company is providing strict guidelines how to use AI and has setup internal platforms to avoid leaking any proprietary software to external models. AI generated software must stay below a certain percentage of the total lines and so on. All quite encouraging.

I have been playing with it a bit and I must say I'm reasonably impressed about what it can do. I'm learning Python and I asked the AI to setup an initial program that can take some arguments and perform some async IO stuff. That worked really well. Next putting the real intelligence into the code was less great, but I'm totally fine with that. For many tasks you need a ton of uninteresting boilerplate code and scaffolding and it can take that out of your hands and save you some time. We have used it to add a variable to our code base. This requires touching 5 files in different places and it does that in 30 seconds instead of 10 minutes of fiddling to remember where it should all go. The code still gets reviewed and tested manually after that.

It can do code reviews. A lot of the comments are noise. Not that the remarks are wrong, but it lacks some contest that humans have. Still it did point out a few interesting issues. I was impressed by a form of complex reasoning where it pointed out that normally you shouldn't set file permissions as wide, but because this other part seems to indicate it handles public information, it's probably ok. That's actual reasoning!

I must admit I'm starting to warm up to AI. It's a great smart code secretary that can help us, but it can't replace us. Yet...

Wudang · Oct 14, 2025

"The code still gets reviewed and tested manually after that."
That's a key part. I helped write test code and cases for IBM's CICS, MQ and other products and everything went through massive testing and regression testing. The lines of code (I know) to test the products were significantly larger than the products themselves.
Working as a developer later writing java webapps we wrote a big project template with all the mocking etc etc ready to have unit tests for everything and larger tests and SonarQube reviewing it all. We still turned fixes and changes around quickly.
I'd say the articles main point is that AI is the straw that threatens to break a software ecology that is already creaking because of a lack of these factors.

Darat · Oct 14, 2025

And it really doesn't help now that what would have been beta versions of apps a few years back are released as release candidates and this is considered normal and indeed often seems to be treated as a virtue!

Wudang · Oct 14, 2025

Darat said:
And it really doesn't help now that what would have been beta versions of apps a few years back are released as release candidates and this is considered normal and indeed often seems to be treated as a virtue!

Yes, often a big misunderstanding of the original "run fast and break things" which was about small changes easily backed out or fixed and deployed to a few instances at first.

Beelzebuddy · Oct 14, 2025

I remember hearing this same crap about the newfangled object oriented programming back in the day, and it was old then because people had been hearing it about compilers for decades. Look at that guy's timeline, he thinks software bloat only started becoming a problem in 2018.

Mongrel · Oct 15, 2025

Darat said:
And it really doesn't help now that what would have been beta versions of apps a few years back are released as release candidates and this is considered normal and indeed often seems to be treated as a virtue!

That's been happening in the gaming space since the early noughties, once the Xbox came out with an ethernet port and the expectation that you'd be persistently online, the game devs quickly brought them in-line with PC titles and were chucking half-finished crap out of the door with release day patches.

jeremyp · Oct 21, 2025

I just started Apple Calculator on my Mac and it's currently running at 61Mb*. This "it leaks 32Gb" is really meaningless. That was probably the amount it had leaked before the person running it noticed there was a problem. I generally leave Calculator running all the time and so, if there is a leak, it's bound to cause problems eventually. The real question is how fast does it leak memory.

* a trifling amount in today's terms but unimaginable to 13 year old me who started programming on a computer with 16k.

Darat · Oct 21, 2025

I presume it has been fixed? Perhaps that is why for many years there was no default calculator on iPads!

Darat · Oct 21, 2025

Mongrel said:
That's been happening in the gaming space since the early noughties, once the Xbox came out with an ethernet port and the expectation that you'd be persistently online, the game devs quickly brought them in-line with PC titles and were chucking half-finished crap out of the door with release day patches.

Well I do remember once arguing with Nintendo about a clipping bug on the N64, apparently there was one area that if you ran at the wall something like 30 plus times you could drive out of the world. Should that have been fixed before shipping - my view was that it wasn't a stop bug as no one in right mind would do that. (It was by the way a terrible game, god knows why anyone would ever have bought it, the producer had a cut out on his desk that showed it as "No 1 Game of the Year" - the context of the article that he didn't cut out was that it was the No 1 WORSE title of the year. It was a contractable obligation that we had a game with its title approved by Nintendo by a certain date. As ever the mighty dollar - or actually francs was the "quality" bar.)

Wudang · Oct 21, 2025

I won't talk about my part in the alpha of what became IBM Websphere as people tend to throw rocks at me. Customers actually paid large sums to be in the alpha program which was a POS but woe betide anyone who even tried hinting there were problems.

JayUtah · Oct 21, 2025

jeremyp said:
I just started Apple Calculator on my Mac and it's currently running at 61Mb*. This "it leaks 32Gb" is really meaningless.

On MacOS the ps -l command is showing me 34.6 GB vsize on a newly opened Calculator app after a few calculations. It's showing a resident set size of about 112 MB. The "resident set" is the set of virtual memory pages that are actually resident in RAM, generally corresponding to the minimum amount of memory for code and data needed to run the program at that given second.

Virtual memory size is meaningless for the purpose of determining resource usaget. All the processes running on my MacBook have virtual set sizes between 30 and 35 GB. There are many reasons why the Darwin kernel would report this without the process actually using that much memory. I wonder if that's the measurement they're (mistakenly) using.

jeremyp said:
That was probably the amount it had leaked before the person running it noticed there was a problem. I generally leave Calculator running all the time and so, if there is a leak, it's bound to cause problems eventually. The real question is how fast does it leak memory.

Yes, this. As I mentioned up-thread, the programs we run on our supercomputers run for up to two weeks and simply can't leak because even small leaks add up rapidly. It's a paradox that our programmers working on some of the most powerful computers available have to be just as resource-conscious as someone programming an embedded system. One of our systems has a RAM capacity of 4 petabytes, and that's not even within the ballpark of what's running at, say, the Dept. of Energy sites today.

Leaving a program running in the background waiting for human input is unlikely to result in a program state that multiples memory leaks. It's mostly idle. But it's absolutely the case that a program that's doing work in the background can get into a state where it's leaking memory at a measurable rate. We had one program that got into a state where it was leaking 32 kB for every message it received from another node in the computer. That translated to a one-line error in the source code.

jeremyp said:
* a trifling amount in today's terms but unimaginable to 13 year old me who started programming on a computer with 16k.

Indeed. The first computer I did engineering on was an IBM System 370 with a whopping 8 megabytes of RAM. We spent a day in meetings trying to justify to management an upgrade to 12 megabytes. We eventually upgraded to a System 3084 and finally a System 3090 with a whopping 6 CPUs and 32 megabytes per CPU. That was luxury.

catsmate · Oct 21, 2025

Wudang said:
I won't talk about my part in the alpha of what became IBM Websphere as people tend to throw rocks at me. Customers actually paid large sums to be in the alpha program which was a POS but woe betide anyone who even tried hinting there were problems.

View media item 4707

Blue Mountain · Oct 22, 2025

JayUtah said:
As I mentioned up-thread, the programs we run on our supercomputers run for up to two weeks and simply can't leak because even small leaks add up rapidly. It's a paradox that our programmers working on some of the most powerful computers available have to be just as resource-conscious as someone programming an embedded system. One of our systems has a RAM capacity of 4 petabytes, and that's not even within the ballpark of what's running at, say, the Dept. of Energy sites today.

Have you considered writing in a memory safe language such as Logo or Python? :rntongue:

Software quality collapse

BOFH

Maledictorian

BOFH

Lackey

Lackey

Penultimate Amazing

Lackey

Thinker

BOFH

Lackey

BOFH

Penultimate Amazing

Begging for Scraps

Philosopher

Lackey

Lackey

BOFH

Penultimate Amazing

No longer the 1

Resident Skeptical Hobbit

Similar threads