Login
Talks & Seminars
Title: The Plural Architecture: Shared Memory Many-cores with Hardware Scheduling
Prof. Ran Ginosar, Dept. of Electrical Engg. & Computer Science at the Technion, Israel
Date & Time: February 24, 2012 15:00
Venue: Conference Room, 01st Floor, C Block, Dept. of Computer Science & Engg., Kanwal Rekhi Bldg.
Abstract:
The Plural many-core architecture combines hundreds of simple cache-less cores, many shared cache banks, a hardware scheduler, and two custom active networks-on-chip: cores-to-shared-caches and cores-to-scheduler. A theoretical model (almost) justifies increasing the number of cores while making them smaller and slower, maximizing performance-to-power ratio. Several benchmark simulations are demonstrated, showing close to linear speedup and high performance-to-power ratio. A de-synchronized PRAM-like task-based non-CSP programming model for shared memory enables fine-grain parallelism. Plural tasks are sequential. Precedence relations among tasks are described by a task map, which is executed by the hardware scheduler. Duplicable tasks are described once and executed as multiple instances, under control of the hardware scheduler. Tasks are not functions they neither receive inputs nor generate outputs; data are shared only through shared memory. Control tasks (join, fork, condition) contain no code, and are executed only by the scheduler. There are no locking mechanismsall synchronizations are formulated as inter-task dependencies and managed by the scheduler. The shared memory is organized as many banks, allowing all or most cores simultaneous access. A multistage interconnection network resolves address conflicts and may include fetch-and-op facility to enhance PRAM-like concurrent read-and-write as well as unique indexing operations. Addresses are interleaved to reduce conflicts. The entire shared memory is organized as a shared L1 cache. The architecture supports an optional L2 cache on-chip. The Plural architecture employs standard processors; we have tried Sparc, Microblaze and some proprietary ones. Cores contain a small private scratch-pad memory for unshared variables. Shared co-processors include FPU and collective support. DMA processors provide for data pre-fetching. The Plural architecture is intended for one-job-at-a-time accelerators; it is not a multitasking multicore, and there should be no OS. The architecture has been implemented as an IP core for mobile SoC and as a FPGA accelerator. It has yet to be demonstrated as a standalone IC. During the talk we will also contrast it with other many-core architectures including Tiles, Rigel and XMT.
Speaker Profile:
Prof. Ran Ginosar received BSc from the Technion and PhD from Princeton University in 1982. He has conducted research at Bell Laboratories, at the University of Utah and at Intel Research Laboratories in Oregon, USA. He is member of the faculty of EE and CS departments at the Technion, and heads the VLSI Systems Research Center. He has also co-founded several start-up companies in the area of VLSI and parallel processing. His research interests focus on VLSI and parallel processing architectures.
List of Talks

Webmail

Username:
Password:
Faculty CSE IT
Forgot Password
    [+] Sitemap     Feedback