• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Old Computer Architectures

a_unique_person

Director of Hatcheries and Conditioning
Joined
Jul 19, 2002
Messages
49,597
Location
Waiting for the pod bay door to open.
There was one design that I thought was a really radical design, but before it's time, as the technology of the day would not have been able to do it justice. These days, however, I think it would be perfect.

That is the idea behind the TMS 9900. Instead of screwing around with registers, just do away with them altogether. Registers were really just a manually cache, where you could hold data for working on, rather than having to perform a very slow read/write to memory.

With the 9900, everything is a memory/memory operation. The only register is a program counter and status code. At the time, it must have slowed things down a lot. These days, of virtual registers and cache, you could have what is effectively a 'level 0' cache, that holds the result of the last few operations. Problem solved about registers, (although current technology almost makes them virtual anyway).
 
The main purpose of registers is speed. Registers, being on the CPU chip, running at CPU clock are way faster than memory. And, of course, the control of them is micro-coded, another speed advantage. Declaring the right variables "register" in C can really speed up your program.

Another architecture of the past (I forget its type name) had the stack as CPU registers. That would make for some speedy branching. Of course, this sets a limit to stack depth that will seriously cramp the style of most compilers.

Hans
 
Which is what you can do with the technology available these days. Use registers, but use them as 'virtual' memory (not virtual in the traditional sense of the term.) Registers will always be faster, but add a lot of complexity to compiler optimisation, etc.
 
Another purpose of registers is to have a small address space that can be encoded in the instruction in only a few bits.

~~ Paul
 
I'm no expert, but here's the way I understand it.

Access of data stored in registers on the CPU which operate at full CPU speed is faster than retrieval of that data by way of the bus which operates at a much slower speed. Typically several times faster.
 
It's been a long time since i looked at assembler (6502 Assembly baby!) but I thought a lot of instructions had particular registers implicit in the instruction. Something like add register A to register B. With this wouldn't you need to specify the memory locations?

I guess you could go the way of Forth where there are no variables, just the stack (virtual in the case of this processor) and have assembler instructions that work on the top X items of the stack.
 
Modern L0 caches *are* essentialy registers in that L0 cache memory performs no worse than registers in regards to fetching and storing machine words.

Paul C. Anagnostopoulos has hit the correct issue on the head.

Encoding a reference to (for instance) 1 of 16 registers requires 4 bits, so an instruction that references two registers requires 2*4 = 8 bits of overhead for the referencing.

Encoding a reference to (for instance) 1 of 4294967296 memory locations (4 gigabytes) requires 32 bits, so an instruction that references two memory locations requires 2*32 = 64 bits of overhead for referencing.

This 'overhead' comes into play in several ways - Program code would simply be larger if everything was treated as general purpose memory, and this effects the efficiency of the instruction decode pipeline when the decode pipeline needs to be flushed (due to a branch misprediction) and has to start from scratch.
 
hey betamax was pretty good too.

Actually, it was better quality. However, the tapes has a shorter recording time. But, the thing that killed it off was that Sony refused to license Beta without licensing fees. If Sony never imposed those fees, VHS would have never been created in response.

(6502 Assembly baby!) ... Something like add register A to register B.

Actually, the 6502 had only one general propose register, register "A." However, one could use the Index X and Index Y registers as other registers (giving up some memory addressing modes).

6502! Boo-ya!

Actually, the 6502 was one of the most popular processors ever made. But the must popular device that used it... the Nintendo video game system.

But back to the subject, registers are still much faster than memory. Internal registers operate at the CPU speed, where as memory just about never does.
 
Even ignoring video game systems, the 6502 was probably the most popular 8-bit processer in personal computers; it appeared in Apples, Commodores, Ataris and a bunch of lesser known systems. In some cases minor variations were used such as the 6510 in the Commodore 64, but that's a trivial variation.

The Z80 would give it a good run for its money, though, appearing in almost all the remaining systems: TRS-80s and most CP/M machine, although the latter tended to be business computers. The original 8080 and variants were used a lot less than the Z80.

A Z80 variant was used in some Nintendo systems such as the original Gameboy, so I'm not sure that if you include game systems the 6502 necessarily comes out on top, though.

Arguably, the 6502 treated memory partly as a register array, as a number of direct addressing modes used one-byte addresses for the lowest 256 bytes of memory.

Some microcontrollers like the 8051 family do something similar, with a 64KB address space where the first 256 bytes have special significance, and in the case of the 8051 those bytes of memory are implemented in the processor and don't require an external buss access.
 
Actually, it was better quality. However, the tapes has a shorter recording time. But, the thing that killed it off was that Sony refused to license Beta without licensing fees. If Sony never imposed those fees, VHS would have never been created in response.


Yes I am aware of this.

There was a subtle comment on the current discussion in that joke that I think you may have missed.
 
Modern L0 caches *are* essentialy registers in that L0 cache memory performs no worse than registers in regards to fetching and storing machine words.

Paul C. Anagnostopoulos has hit the correct issue on the head.

Encoding a reference to (for instance) 1 of 16 registers requires 4 bits, so an instruction that references two registers requires 2*4 = 8 bits of overhead for the referencing.

Encoding a reference to (for instance) 1 of 4294967296 memory locations (4 gigabytes) requires 32 bits, so an instruction that references two memory locations requires 2*32 = 64 bits of overhead for referencing.

This 'overhead' comes into play in several ways - Program code would simply be larger if everything was treated as general purpose memory, and this effects the efficiency of the instruction decode pipeline when the decode pipeline needs to be flushed (due to a branch misprediction) and has to start from scratch.

Registers are only used for short term memory addressing, hence the use of a stack for any quick memory addressing. Might work.
 
Registers are only used for short term memory addressing, hence the use of a stack for any quick memory addressing. Might work.

Of course, the top of the stack on desktops (intel/amd) is almost always in the L0 cache and on those rare times that it isnt, it is almost always in the L1/2 cache.

Further, the stack already has special addressing modes as a displacement from the current stack pointer as an 8-bit, 16-bit, or 32-bit signed value. Also, a special addressing mode dealing with a SECOND (implicit) displacement (the EBP register) from the stack pointer (the ESP register) is there.

And the most important point is that on the x86 derivatives, the stack has absolutely no implicit size. Its up to the software to decide that sort of thing (ie, the be-all-end-all generic solution)

off the top of my head, the current x86 has these basic addressing schemes:

[REGISTER]
[ABSOLUTE ADDRESS]

[REGISTER + REGISTER]
[REGISTER + 8-BIT OFFSET]
[REGISTER + 16-BIT OFFSET]
[REGISTER + 32-BIT OFFSET]

[REGISTER + REGISTER + 8-BIT OFFSET]
[REGISTER + REGISTER + 16-BIT OFFSET]
[REGISTER + REGISTER + 32-BIT OFFSET]

[REGISTER * 3-BIT SCALER]
[REGISTER * 3-BIT SCALER + REGISTER]
[REGISTER * 3-BIT SCALER + 8-BIT OFFSET]
[REGISTER * 3-BIT SCALER + 16-BIT OFFSET]
[REGISTER * 3-BIT SCALER + 32-BIT OFFSET]

[STACK POINTER]
[STACK POINTER + 8-BIT OFFSET]
[STACK POINTER + 16-BIT OFFSET]
[STACK POINTER + 32-BIT OFFSET]

[STACK POINTER + BASE POINTER]
[STACK POINTER + BASE POINTER + 8-BIT OFFSET]
[STACK POINTER + BASE POINTER + 16-BIT OFFSET]
[STACK POINTER + BASE POINTER + 32-BIT OFFSET]

I might have added a few that don't really exist (for example, in a few cases there might be a full prefix byte before the instruction to select 16-bit when in 32-bit mode .. I can't remember if that applies to addressing or not) .. also I might have left out a few (I sort of have a life)

And I know I left out the fact that all of these can be prefixed by a segment/selector override (most instructions presume the DS segment/selector, while ES, FS, and GS, and SS selectors are also an option with the proper override prefix)

I dont see how a registerless system can mimick all these addressing modes without A) using more than one instruction to do what currently can be done with one instruction, or B) being just as complex as the current system in which case its just a transformation of the current situation. :eye-poppi

Edited to add:

On top of it all, under the hood a modern CISC processor is really a RISC processor. The transformation level is virtualy 'free' in terms of performance because the transformations take place on seperate circuitry in parallel while previously transformed instructions are being executed. The only time it matters is when the instruction pipeline has to be flushed, and in those cases the penalty is already so large (think of an 8-man bucket brigade) that the extra clock cycle (using 8 instead of 7 men) to do the transform is only a minor issue. Idealy you never flush the instruction pipeline and thats the real solution.
 
Last edited:

Back
Top Bottom