5 minute read

Today, semiconductor chips are used everywhere. From a computer to a refrigerator, every product uses a semiconductor chip. As it’s usage has increased, there is a need to make specialized semiconductor chips. This blog is about generators for customisable cores.

Overview

This is a summary of a talk by Neel Gala, CTO/Co-Founder of InCore semiconductors. He aims to make Generators for Configurable Cores (or processors). He was also a part of SHAKTI a research group at IIT Madras. InCore was registered in 2018. He seeks to produce configurable make-in-India cores.

Architects

Architects for processors strive to make fast cores. They try to create application-specific cores, distinct processors each for PC, TV, Fridge, Chromecast, router, and many more things.

We need architects to make a design of the core and many more to verify that it’s correct. Moreover, the common components are verified again and again. If we make different cores in different teams, there is a lot of wastage of resources. Here evolves the concept of customizable cores, the way we customize our food in restaurants, we can choose what all components we need in our core. Configurability is the key to achieving this dream of customizable cores for maximum use of available resources.

Configurability

For configurability, we need components, a list of required components in the product, and a way to combine them together. In processors, various functionalities(or instructions) and various types of optimizations are the components. Also, we need to verify the final result we get.

Core Generators

Core generators get an input configuration in form of an ISA configuration. The user gets fine control over several architectural choices. Then microarchitecture is configured based on the input ISA configuration. Each baseline core is flavored with numerous amounts of ISA-independent configurations. Finally, the generator configures individual HW components using python. We get a synthesizable instance of the core. Generation of documentation, physical design collaterals, etc. is also automated using Collateral Generation. Now, after generating a core, there is a need to verify it. Here comes the need for VerifGenerators.

Verif Generators

Verif Generators are needed to generate verifiers to verify cores generated by Core Generators. First, ISA compliance is verified. Relevant test suites are generated for verifying the core using tools like RISC-V ISAC. Then the microarchitecture is verified using a test suite generated using a python configurable parametrized test suite. Adaptable Test Harness and Quality Metrics follow. Then RIVER-CORE umbrella framework is used to generate detailed HTML reports. Python is the only used language. This makes it easier as one doesn’t need to learn many new languages, and their syntaxes before getting started.

You would be thinking why RISC-V? RISC-V is open-source and freely available. Apart from this, the main reason is that it is highly flexible. A flexible base architecture would be a great choice for a configurable core.

Evolution of configurable cores

Idea

Here are some of the things one needs to take care of while planning to make a configurable core:

  • Limit your options to practical ones.
  • Re-usability of available components.
  • Break down into smaller modules

These are not only for configurable cores but for anything. If you want to make something new, it is important that you limit yourself to practically useful ideas and not do the already available things again.

Example

Let’s consider we start with a basic 5-stage vanilla pipeline and want to increase performance. Here is how step-by-step we can increase performance:

  • Speed up — Adding Cache: If we add Cache in the Instruction fetch and Memory stage for getting instructions and memory data in less number of cycles, the speed can be significantly increased. Memory access usually remains a bottleneck in the process.
  • Speed up — Branch Predictors: In a program, there are 20% branch instructions on average. Our pipeline is setback by 2 cycles every time there is a branch. To prevent this, we can use a branch predictor in the instruction fetch stage. A branch predictor gets PC as input and predicts if this instruction is a branch or not and if branch, what is the PC where it jumps. It works by learning the history of branches when the instruction was executed earlier.
  • Usage — Floating Point Calculations: For floating point instructions, we need to add FPU in Execute stage.
  • Speed up — Forwarding: If an instruction depends on the completion of the previous instruction, it needs to stall for two cycles. To prevent this, we may add forwarding. The result of execute stage of the previous instruction is directly fed into the execute stage of the next instruction before RegWrite happens. This prevents the stalls. For this case, when the instruction gets the value in the memory stage, the next instruction can be passed through to the memory stage and we can add an ALU in the memory stage to complete the next instruction.
  • Speed up — Virtual Address Translation: The operating system maps virtual addresses to real addresses. We can add TLB(Translation Lookaside Buffer) with caches to store the recent translations for quick address translation.
  • Speed up — Threads: We can combine two such pipelines in parallel, sharing caches and FPU. This way, we can fetch two instructions in a cycle as compared to 1 in the earlier case. As caches and FPU are considerably huge circuits, we can share these among the two and duplicate other components.
  • Cache Improvement — Cache Hierarchy: Cache hierarchy makes levels of cache. The larger the size, the more the number of cycles are needed. Different levels of cache can store different data based on access patterns.

Summary and Tips

This explains the importance of Configurable cores and some steps in making configurable cores. This also gives an example evolution of a core from a 5-stage vanilla pipeline. Here are some tips if you get a new idea and want to work on it.

  • Always design things for common case as corner cases are rare.
  • Automate only for scalability
  • Complete what you have started so that it is usable.
  • Re-use available tools for better efficiency.