I find that surprising. The conventional wisdom in my small part of the IT world was that garbage collection is faster. For example, allocation under GC is very simple. You have a big wodge of contiguous memory and you just take a slice off one end for your new object. With manual memory management, you typically have lists of free blocks of memory that have to be manipulated. There are other overheads too on deallocation, plus subtle things like keeping things together for better cache performance.
Of course, with a garbage collector, a lot of this work still has to happen but it's deferred until some condition triggers a collection. You may then get slow downs for brief periods but these are unlikely to be a serious problem for most general purpose systems.
In my experience at the time, precise garbage collection was almost always faster than manual deallocation. On the other hand, few programmers of that time used or cared about precise garbage collection. What they cared about was the performance of languages such as C and C++, which (due to their design) are unable to use precise garbage collection. Most empirical studies of that time therefore compared manual deallocation to the conservative (hence not precise) garbage collectors that were available for C and C++.
For his master's thesis, one of my PhD students constructed an implementation of Scheme that could use any of several different garbage collectors (including conservative, generational precise, and stop-and-copy precise), and used it to compare their performance on a number of allocation-intensive benchmarks. The conservative collector was competitive on some benchmarks, but often not competitive at all. For
his PhD dissertation, he (and I) designed, implemented, and benchmarked two new algorithms for precise generational collection, comparing their performance to the more traditional precise generational and stop-and-copy collectors. I don't think he bothered to benchmark against the conservative collector, but I'd have to check on that before I could be sure.
Another PhD student implemented and tested another novel gc algorithm, which is scalable in the sense of guaranteeing bounded latency regardless of data size and program behavior. We then published a couple of conference papers about that algorithm and its refinements, but the real-time gc world was (and continues to be) far more interested in algorithms that have to be fine-tuned for particular applications than in a general-purpose algorithm whose guaranteed latency is independent of the application program. Which goes some way toward explaining why we were never able to get the funding needed to build an industrial-quality implementation of our algorithm.
By the way, that second PhD student
(Felix Klock) served as senior lead of the Rust compiler team until 2023 and continues to contribute to the evolution of Rust's design.