Femto-4v0.5 (Computer)
0 Stars     289 Views    

Author: Sanderokian Stfetoneri

Forked from: Sanderokian Stfetoneri/Femto-4 (Computer)

Project access type: Public

Description:

Latest versions of the 256-Series, including the Femto-4:
https://circuitverse.org/users/4699/projects/256-series

A 16-bit computer/maybe console inspired thing, the Femto-4. This is a fork from the main branch to keep a semi-functional version around. This project was started around November 2020.

Currently runs:
Test code demonstrating basic functionality. First it uses most instructions to ensure they work, before showing the graphical capabilities of the computer.

Features:
Immediate, direct and indirect memory access
Jumps and conditional Jumps
16-bit address space
Switchable Memory Banks
An ALU capable of logical operators, addition, subtraction and shift left
Easy to add to buses
"Fast Execution" - Can run more than one instruction per clock cycle
15x15 pixel display

Will have:
An ALU capable of shift right and multiplying
Inputs, both "controllers" and keyboards
"Faster Execution" - Runs instructions on both edges of the clock pulse 
Random number register
Text outputs
Stack
Assembler (hopefully)
Save memory
Several pre-written carts to play with

General Architecture:
The Femto-4 has variable length instructions that are comprised of multiple 16-bit chunks. First the OP Code of the instruction is read, and then depending on the OP Code, additional pieces of data may be read for the operands. This allows execution to become incorrectly offset, which can lead to the execution of garbage if the PC is jumped to an incorrect address. This is usually fine, since the OP Code space is so empty that the data will likely be passed one at a time until the next valid instruction. Data is read through the standard data retrieval system (which is handy since its design is so universal and easy to add to) making this architecture a Von Neuman architecture as opposed to a Harvard architecture, like my previous, worse, computer. The MAR always specifies the address being read to or written from, whilst the MDR always holds the data being written. Data from the data out bus can be written to any special register during the instruction. OP Codes and operands are all 16-bits, which is a bit wasteful in terms of OP Code usage, however it was easier to implement this way, and so that is what I went with. 

Memory Mapping:
The 16-bit address space of the Femto-4 is memory mapped, with all data being stored somewhere in the address space, including many special registers like the program counter and the Memory Address Register. The last 48kx16b of memory (all addresses starting with 01, 10, or 11) are dedicated to the cart memory. This is where the interchangeable program would be stored, allowing programs to be easily changed by changing carts. (However, currently there is only one cart). The carts have 32 16kx16b EEPROM/RAM chips, which can be switched between during execution by writing to address 00cc (subject to change). This gives each cart 512kx16b of memory to play with. In theory, additional memory can be added in a cart by creating a similar system on the inside of the cart, which would allow it to swap between even more EEPROM/RAM chips. The initial 16kx16b are therefore mapped to everything else, including a fixed "work" RAM chip that cannot be switched out, the bootloader, the PPU data, general use registers and special use registers. 

"Fast Execution":
Execution at the fastest clock speed (one pulse every 50ms, or 20Hz) is terribly slow, and would make reasonable graphics effectively impossible. Due to this, the Femto-4 includes several execution modes that allow the computer to run much faster. There are two registers involved in this, address 00ca, the mode register, and address 00cb, the protection register. When the least significant bit of the mode register is low, the computer runs normally, executing 1 instruction per clock pulse. When it is set high however, the computer enters fast execution, where it executes multiple instructions per clock pulse. This is achieved by looping a rising edge monostable circuit into a falling edge monostable circuit, producing a loop that will pulse indefinitely until the looping line is written high too by some external factor. Stopping the loop is critical since leaving the loop running will stop CircuitVerse's execution, due to it going over the stack limit of the execution. "Fast execution" is always paused by a 0000 OP Code, which ensures that the computer will not attempt to "fast execute" memory that has not been written to. Setting the second bit of the mode register will enable protection. This will ensure that computer only executes as many instructions as the value in the protection register. This protects execution by ensuring that the loop will always pause before the cycle limit is reached. A value of 8 works consistently, though I have not toyed with values much higher. In future, additional execution options will likely be made available, the current planned ones being enabling falling edge execution as well as rising edge execution, to double the execution speed, updating graphics on both edges of the clock pulse, updating graphics every other clock pulse, updating graphics when the update graphics command is run, and disabling graphics. 

Graphics:
The Femto-4 is capable of driving a 15x15 15bit direct colour screen. It has space for 32 "sprites" which are rectangles with an assigned colour. Currently, every time the clock pulses low, the screen is refreshed. Should a falling edge "fast execution" mode be added and work, the falling edge should only be used to execute game code, since writing graphics data as the screen is being drawn may mess up the graphics. These 32 "sprites" have their data stored in the PPU RAM in the following format: The first 16 bits are the corners of the rectangle, with each coordinate being 4 bits. The coordinates are ordered x1 x2 y1 y2. The next 16 bits are the sprites colour, with the first 15 bits being used for 15 bit direct colour, and the last bit being used to enable or disable drawing the sprite. Since the screen is not wiped every time it is refreshed, the background must be sprite to ensure that the screen is fully wiped before the rest of the sprites are drawn on. The "sprites" are drawn in memory order, with the "sprite" with the largest address always being drawn last and therefore on top, of all other "sprites". This is achieved by using the exact same monostable clock system as "Fast Execution", which reads off all the sprite data and draws them to the screen in a single clock pulse. This can loop more times safely than the main CPU since it has less dependencies which dramatically increase the simulation's stack usage. The demonstration code uses two "sprites". The first "sprite" is the black background, and the second "sprite" is the red rectangle. The coordinate value of the red rectangle is incremented every frame, causing the animation. Had I had more storage I might have incremented the colour as well to show the colour capabilities of the Femto-4. Whilst driving a larger screen might be nice, given the limit in the number of instructions per second, it is unlikely that it could be well utilised, which is why I have chosen this screen size. A variation with larger screens may appear at some point, but this is a low priority for me. 

ALU:
The basic ALU (currently the only implemented bit of the ALU) was inspired by the ALU-74LS181. It was designed to flexibly change between various operations by changing an additional piece of data which is bundled in the OP Code. This allows a single ALU to handle all the required processes, such as the basic binary logic operations, shift right, adding and subtracting. This is unlike my previous computer which had different chips for each operation it could do. Additional capability chips, such as multiplication and shifting right will be added later. 

General Registers:
This computer probably has more general registers than it should. What makes the 256 general registers unique is that they can be easily piped into the A and B operators when performing ALU operations. This allows ALU instructions to only have one operand, with the lower 8-bits being the register address of value A, and with the higher 8-bits being the register address of value B. 

Timing:
This computer is timed using a several standard delay chips. The pulse length running in to the computer is about 10k units long. This then runs into the pulse generator which pulses 32 unique lines with a 20k delay between them which can then be used to time when control lines pulse. This is bundled together into a 32-bit timing bus, which then uses bit selectors to select how much delay that pulse will have. This is why there are 32 sub-circuits which are effectively just bit specific bit selectors - they allow me to "compactly" build timing circuitry. In addition to the main delay of 20k between the chosen bits, I can add "on-delays" to further delay the control line, allowing me to ensure that control lines like the enable read line can be on before the register reads from them. "On-delays" were first constructed to ensure that the data out line did not have contention issues - it ensured that the previous address outputting data was disabled before the next address outputting data onto the line was enabled. They add 1k of delay on the rising edge, and less than 10 delay on the falling edge. This way I do not need to worry about "on-delays" increasing the delay of one command into the next. The "Fast Execution" loop gives a pulse of 300k, with a delay between pulses of around 600k. This ensures that the previous instruction will finish before the next instruction starts. I do not entirely understand how the timing system works, since the in my mind it should be producing contention issues, however proofing it against that breaks the system entirely. 

Other Notes:
You may note that I use 32b EEPROM banks instead of 16b ones. This choice was made to reduce the number of EEPROM banks required. Each half of the EEPROM's 32b output is treated as one address. Whilst this added a slight bit of additional complexity in writing, it halved the number of EEPROM banks required. This project was started when I realised that EEPROM banks could be that large, since a major sticking point in a previous attempt was the number of EEPROM banks required. (That attempt is private and completely dysfunctional. It also suffers from contention errors caused by incomplete splitters.)

Created: Dec 09, 2020

Updated: Aug 26, 2023


Comments

You must login before you can post a comment.