Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
SlideShare a Scribd company logo
Department of Collegiate and Technical Education
Theory of Parallelism (Module 1 – Session1)
Course Outcome : Explain the concepts of parallel computing and
hardware technologies
Advanced Computer Architecture (VII Semester)
Computer Science and Engineering
Computer Science & Engineering – 17CS72
1. Parallel Computer Models
1.1 The State of Computing
1.1.1 Computer Development Milestones
1.1.2 Elements of Modern Computers
1.1.3 Evolution of Computer Architecture
1.2 Multiprocessors and Multicomputers
1.2.1 Shared-Memory Multiprocessors
1.2.2 Distributed-Memory Multicomputers
1.2.3 A Taxonomy of MIMD Computers
1.3 Multivector and SIMD Computer
1.3.1 Vector Supercomputers
1.3.2 SIMD Supercomputers
1.4 PRAM and VLSI Models
1.4.1 Parallel Random-Access Machines
1.4.2 VLSI Complexity Model
Table of Contents
Computer Science & Engineering – 17CS72
1. Parallel Computer Models
• Parallel Computer Models have transferred the world we live in.
• The key enabling technology in modern computers is Parallel
processing.
• Demand for higher performance, lower costs, and sustained
productivity in real-life applications.
• Forms of Parallelism-
– Lookahead, pipelining, vectorization,concurrency,
simultaneity, data parallelism, partitioning, interleaving,
overlapping, multiplicity, replication, time sharing,space
sharing,multitasking, multiprogramming, multithreading, and
distributed computing.
• Physical architectures and theoretical machine models are
discussed here.
Computer Science & Engineering – 17CS72
1.1 The State of Computing
• Early computing was entirely mechanical:
– abacus (about 500 BC)
– mechanical adder/subtracter (Pascal, 1642)
– difference engine design (Babbage, 1827)
– binary mechanical computer (Zuse, 1941)
– electromechanical decimal machine (Aiken, 1944)
• Mechanical and electromechanical machines have limited speed and
reliability because of the many moving parts. Modern machines use
electronics for most information transmission.
Computer Science & Engineering – 17CS72
1.1.1 Computer Development Milestones
• Computing is normally thought of as being divided into
generations.
• Each successive generation is marked by sharp changes in
hardware and software technologies.
• With some exceptions, most of the advances introduced in
one generation are carried through to later generations.
• We are currently in the fifth generation.
Computer Science & Engineering – 17CS72
First Generation (1945 to 1954)
• Technology and Architecture
– Vacuum tubes and relay memories
– CPU driven by a program counter (PC) and accumulator
– Machines had only fixed-point arithmetic
• Software and Applications
– Machine and assembly language
– Single user at a time
– No subroutine linkage mechanisms
– Programmed I/O required continuous use of CPU
• Representative systems: ENIAC, Princeton IAS, IBM 701
Computer Science & Engineering – 17CS72
Second Generation (1955 to 1964)
• Technology and Architecture
– Discrete transistors and core memories
– I/O processors, multiplexed memory access
– Floating-point arithmetic available
– Register Transfer Language (RTL) developed
• Software and Applications
– High-level languages (HLL): FORTRAN, COBOL,
ALGOL with compilers and subroutine libraries
– Still mostly single user at a time, but in batch mode
• Representative systems: CDC 1604, UNIVAC LARC, IBM
7090
Computer Science & Engineering – 17CS72
Third Generation (1965 to 1974)
• Technology and Architecture
– Integrated circuits (SSI/MSI)
– Microprogramming
– Pipelining, cache memories, lookahead processing
• Software and Applications
– Multiprogramming and time-sharing operating systems
– Multi-user applications
• Representative systems: IBM 360/370, CDC 6600, TI ASC,
DEC PDP-8
Computer Science & Engineering – 17CS72
Fourth Generation (1975 to 1990)
• Technology and Architecture
– LSI/VLSI circuits, semiconductor memory
– Multiprocessors, vector supercomputers, multicomputers
– Shared or distributed memory
– Vector processors
• Software and Applications
– Multprocessor operating systems, languages, compilers,
and parallel software tools
• Representative systems: VAX 9000, Cray X-MP, IBM 3090,
BBN TC2000
Computer Science & Engineering – 17CS72
Fifth Generation (1990 to present)
• Technology and Architecture
– ULSI/VHSIC processors, memory, and switches
– High-density packaging
– Scalable architecture
– Vector processors
• Software and Applications
– Massively parallel processing
– Grand challenge applications
– Heterogenous processing
• Representative systems: Fujitsu VPP500, Cray MPP, TMC
CM-5, Intel Paragon
Computer Science & Engineering – 17CS72
1.1.2 Elements of Modern Computers
• The hardware, software, and programming elements of
modern computer systems can be characterized by looking at
a variety of factors, including:
– Computing problems
– Algorithms and data structures
– Hardware resources
– Operating systems
– System software support
– Compiler support
Computer Science & Engineering – 17CS72
Application Software
Operating System
Hardware
Architecture
Algorithms and
Data Structures
Computing
Problems
High-level
Languages
Performance
Evaluation
Mapping
Programming
Binding
(compile,load)
Fig 1.1 Elements of a modern computer system
Computer Science & Engineering – 17CS72
Computing Problems
• Numerical computing
– complex mathematical formulations
– tedious integer or floating-point computation
• Transaction processing
– accurate transactions
– large database management
– information retrieval
• Logical Reasoning
– logic inferences
– symbolic manipulations
Computer Science & Engineering – 17CS72
Algorithms and Data Structures
• Traditional algorithms and data structures are designed for
sequential machines.
• New, specialized algorithms and data structures are needed to
exploit the capabilities of parallel architectures.
• These often require interdisciplinary interactions among
theoreticians, experimentalists, and programmers.
Computer Science & Engineering – 17CS72
Hardware Resources
• The architecture of a system is shaped only partly by the
hardware resources.
• The operating system and applications also significantly
influence the overall architecture.
• Not only must the processor and memory architectures be
considered, but also the architecture of the device interfaces
(which often include their advanced processors).
Computer Science & Engineering – 17CS72
Operating System
• Operating systems manage the allocation and deallocation of
resources during user program execution.
• UNIX, Mach, and OSF/1 provide support for
– multiprocessors and multicomputers
– multithreaded kernel functions
– virtual memory management
– file subsystems
– network communication services
• An OS plays a significant role in mapping hardware resources
to algorithmic and data structures.
Computer Science & Engineering – 17CS72
System Software Support
• Compilers, assemblers, and loaders are traditional tools for
developing programs in high-level languages. With the
operating system, these tools determine the bind of resources
to applications, and the effectiveness of this determines the
efficiency of hardware utilization and the system’s
programmability.
• Most programmers still employ a sequential mind set, abetted
by a lack of popular parallel software support.
Computer Science & Engineering – 17CS72
System Software Support contd..
• Parallel software can be developed using entirely new
languages designed specifically with parallel support as its
goal, or by using extensions to existing sequential languages.
• New languages have obvious advantages (like new constructs
specifically for parallelism), but require additional
programmer education and system software.
• The most common approach is to extend an existing
language.
Computer Science & Engineering – 17CS72
Compiler Support
• Preprocessors
– use existing sequential compilers and specialized libraries
to implement parallel constructs
• Precompilers
– perform some program flow analysis, dependence
checking, and limited parallel optimzations
• Parallelizing Compilers
– requires full detection of parallelism in source code, and
transformation of sequential code into parallel constructs
• Compiler directives are often inserted into source code to aid
compiler parallelizing efforts
Computer Science & Engineering – 17CS72
1.1.3 Evolution of Computer Architecture
• Architecture has gone through evolutional, rather than
revolutional change.
• Sustaining features are those that are proven to improve
performance.
• Starting with the von Neumann architecture (strictly
sequential), architectures have evolved to include processing
lookahead, parallelism, and pipelining.
Computer Science & Engineering – 17CS72
Scalar
Sequenti
al
Lookahead
I/E Overlap Functional
Parallelism
Multiple
Func. Units
Pipeline
Implicit Vector Explicit Vector
Memory-to-
memory
Register-
to-register
SIMD MIMD
Associative
Processor
Processor
Array
Multicomputer
Mutiprocessor
Architectural
Evolution
Fig 1.2 Tree showing architectural evolution from sequential scalar computers to vector processors and parallel computers
Legends:
I/E: Instruction Fetch and Execute.
SIMD: Single Instruction stream
and Multiple Data streams.
MIMD: Multiple Instruction
steams and Multiple Data
Streams.
Computer Science & Engineering – 17CS72
Flynn’s Classification (Introduced by Michael Flynn in
1972)
• Single instruction, single data stream (SISD)
– conventional sequential machines
• Single instruction, multiple data streams (SIMD)
– vector computers with scalar and vector hardware
• Multiple instructions, multiple data streams (MIMD)
– parallel computers
• Multiple instructions, single data stream (MISD)
– systolic arrays
• Among parallel machines, MIMD is most popular, followed
by SIMD, and finally MISD.
Computer Science & Engineering – 17CS72
Fig 1.3 (a) SISD uniprocessor architecture
Processing
element (PE)
Main memory
(M)
Instructions
Data
Control Unit PE Memory
PE
IS
IS DS
Computer Science & Engineering – 17CS72
Applications:
• Image processing
• Matrix manipulations
• Sorting
Fig 1.3 (b) SIMD architecture with distributed memory
Computer Science & Engineering – 17CS72
SIMD Architectures
• Fine-grained
– Image processing application
– Large number of PEs
– Minimum complexity PEs
– Programming language is a simple extension of a
sequential language
• Coarse-grained
– Each PE is of higher complexity and it is usually built
with commercial devices
– Each PE has local memory
Computer Science & Engineering – 17CS72
Fig 1.3 (c) MIMD architecture (with shared memory)
Computer Science & Engineering – 17CS72
Fig 1.3(d) MISD architecture (the systolic array)
Applications:
• Classification
• Robot vision
Computer Science & Engineering – 17CS72
Parallel/Vector Computers
• Intrinsic parallel computers execute in MIMD mode.
• Two classes:
– Shared-memory multiprocessors
– Message-passing multicomputers
• Processor communication
– Shared variables in a common memory (multiprocessor)
– Each node in a multicomputer has a processor and a
private local memory, and communicates with other
processors through message passing.
Computer Science & Engineering – 17CS72
Pipelined Vector Processors
• SIMD architecture
• A single instruction is applied to a vector (one-dimensional
array) of operands.
• Two families:
– Memory-to-memory: operands flow from memory to
vector pipelines and back to memory
– Register-to-register: vector registers used to interface
between memory and functional pipelines
Computer Science & Engineering – 17CS72
SIMD Computers
• Provide synchronized vector processing
• Utilize spatial parallelism instead of temporal parallelism
• Achieved through an array of processing elements (PEs)
• Can be implemented using associative memory.
Computer Science & Engineering – 17CS72
Development Layers (Ni, 1990)
• Hardware configurations differ from machine to machine (even
with the same Flynn classification)
• Address spaces of processors vary among different architectures,
and depend on memory organization, and should match target
application domain.
• The communication model and language environments should
ideally be machine-independent, to allow porting to many
computers with minimum conversion costs.
• Application developers prefer architectural transparency.
Computer Science & Engineering – 17CS72
Computer Science & Engineering – 17CS72
1.1.4 System Attributes to Performance
• Performance depends on
– hardware technology
– architectural features
– efficient resource management
– algorithm design
– data structures
– language efficiency
– programmer skill
– compiler technology
Computer Science & Engineering – 17CS72
Performance Indicators
• Turnaround time depends on:
– disk and memory accesses
– input and output
– compilation time
– operating system overhead
– CPU time
• Since I/O and system overhead frequently overlaps processing by
other programs, it is fair to consider only the CPU time used by a
program, and the user CPU time is the most important factor.
Computer Science & Engineering – 17CS72
Clock Rate and CPI
• CPU is driven by a clock with a constant cycle time  (usually
measured in nanoseconds).
• The inverse of the cycle time is the clock rate (f = 1/,
measured in megahertz).
• The size of a program is determined by its instruction count,
Ic, the number of machine instructions to be executed by the
program.
• Different machine instructions require different numbers of
clock cycles to execute. CPI (cycles per instruction) is thus
an important parameter.
Computer Science & Engineering – 17CS72
Average CPI
• It is easy to determine the average number of cycles per
instruction for a particular processor if we know the
frequency of occurrence of each instruction type.
• Of course, any estimate is valid only for a specific set of
programs (which defines the instruction mix), and then only if
there are sufficiently large number of instructions.
• In general, the term CPI is used with respect to a particular
instruction set and a given program mix.
Computer Science & Engineering – 17CS72
Performance Factors (1)
• The time required to execute a program containing Ic
instructions is just T = Ic  CPI  .
• Each instruction must be fetched from memory, decoded,
then operands fetched from memory, the instruction executed,
and the results stored.
• The time required to access memory is called the memory
cycle time, which is usually k times the processor cycle time
. The value of k depends on the memory technology and the
processor-memory interconnection scheme.
Computer Science & Engineering – 17CS72
Performance Factors (2)
• The processor cycles required for each instruction (CPI) can
be attributed to
– cycles needed for instruction decode and execution (p),
and
– cycles needed for memory references (m  k).
• The total time needed to execute a program can then be
rewritten as T = Ic  (p + m  k) .
Computer Science & Engineering – 17CS72
System Attributes
• The five performance factors (Ic , p, m, k, ) are influenced by
four system attributes:
– instruction-set architecture (affects Ic and p)
– compiler technology (affects Ic and p and m)
– CPU implementation and control (affects p  )
– cache and memory hierarchy (affects memory access
latency, k  )
• Total CPU time can be used as a basis in estimating the
execution rate of a processor.
Computer Science & Engineering – 17CS72
MIPS Rate
• If C is the total number of clock cycles needed to execute a
given program, then total CPU time can be estimated as T = C
  = C / f.
• Other relationships are easily observed:
– CPI = C / Ic
– T =Ic  CPI  
– T =Ic  CPI / f
• Processor speed is often measured in terms of millions of
instructions per second, frequently called the MIPS rate of
the processor.
Computer Science & Engineering – 17CS72
MIPS Rate
• The MIPS rate is directly proportional to the clock
rate and inversely proportion to the CPI.
• All four system attributes (instruction set,
compiler, processor, and memory technologies)
affect the MIPS rate, which varies also from
program to program.
10
10
10 6
6







C
I
f
CPI
f
T
I
rate
MIPS c
c
Computer Science & Engineering – 17CS72
Throughput Rate
• The number of programs a system can execute per unit time, Ws ,
in programs per second.
• CPU throughput, Wp, is defined as
CPI
I
f
W
c
p


• In a multiprogrammed system, the system throughput is often
less than the CPU throughput.
Computer Science & Engineering – 17CS72
Floating Point Operations per Second
• Abbreviated as flops.
• Used mostly in compute-intensive applications in science and
engineering.
• With prefix mega (106), giga(109),tera (1012) or peta (1015)
written as megaflops(mflops), gigaflops(gflops), teraflops or
petaflops.
Computer Science & Engineering – 17CS72
Example 1. VAX/780 and IBM RS/6000
• The instruction count on the RS/6000 is 1.5 times
that of the code on the VAX.
• Average CPI on the VAX is assumed to be 5.
• Average CPI on the RS/6000 is assumed to 1.39.
• VAX has typical CISC architecture.
• RS/6000 has typical RISC architecture.
Machine Clock Performance CPU Time
VAX 11/780 5 MHz 1 MIPS 12x seconds
IBM RS/6000 25 MHz 18 MIPS x seconds
Computer Science & Engineering – 17CS72
Programming Environments
• Programmability depends on the programming environment
provided to the users.
• Conventional computers are used in a sequential programming
environment with tools developed for a uniprocessor computer.
• Parallel computers need parallel tools that allow specification or
easy detection of parallelism and operating systems that can
perform parallel scheduling of concurrent events, shared
memory allocation, and shared peripheral and communication
links.
Computer Science & Engineering – 17CS72
Implicit Parallelism
• Use a conventional language (like C, Fortran, Lisp, or Pascal) to
write the program.
• Use a parallelizing compiler to translate the source code into
parallel code.
• The compiler must detect parallelism and assign target machine
resources.
• Success relies heavily on the quality of the compiler.
• Kuck (U. of Illinois) and Kennedy (Rice U.) used this approach.
Computer Science & Engineering – 17CS72
Explicit Parallelism
• Programmer write explicit parallel code using parallel dialects of
common languages.
• Compiler has reduced need to detect parallelism, but must still
preserve existing parallelism and assign target machine
resources.
• Seitz (Cal Tech) and Daly (MIT) used this approach.
Computer Science & Engineering – 17CS72
Computer Science & Engineering – 17CS72
Needed Software Tools
• Parallel extensions of conventional high-level languages.
• Integrated environments to provide
– different levels of program abstraction
– validation, testing and debugging
– performance prediction and monitoring
– visualization support to aid program development,
performance measurement
– graphics display and animation of computational results
Computer Science & Engineering – 17CS72
Thank you
Computer Science & Engineering – 17CS72

More Related Content

CSE_17CS72_U1_S1_Pr.pptxx"xxxxxxxxxxxx"xx

  • 1. Department of Collegiate and Technical Education Theory of Parallelism (Module 1 – Session1) Course Outcome : Explain the concepts of parallel computing and hardware technologies Advanced Computer Architecture (VII Semester) Computer Science and Engineering Computer Science & Engineering – 17CS72
  • 2. 1. Parallel Computer Models 1.1 The State of Computing 1.1.1 Computer Development Milestones 1.1.2 Elements of Modern Computers 1.1.3 Evolution of Computer Architecture 1.2 Multiprocessors and Multicomputers 1.2.1 Shared-Memory Multiprocessors 1.2.2 Distributed-Memory Multicomputers 1.2.3 A Taxonomy of MIMD Computers 1.3 Multivector and SIMD Computer 1.3.1 Vector Supercomputers 1.3.2 SIMD Supercomputers 1.4 PRAM and VLSI Models 1.4.1 Parallel Random-Access Machines 1.4.2 VLSI Complexity Model Table of Contents Computer Science & Engineering – 17CS72
  • 3. 1. Parallel Computer Models • Parallel Computer Models have transferred the world we live in. • The key enabling technology in modern computers is Parallel processing. • Demand for higher performance, lower costs, and sustained productivity in real-life applications. • Forms of Parallelism- – Lookahead, pipelining, vectorization,concurrency, simultaneity, data parallelism, partitioning, interleaving, overlapping, multiplicity, replication, time sharing,space sharing,multitasking, multiprogramming, multithreading, and distributed computing. • Physical architectures and theoretical machine models are discussed here. Computer Science & Engineering – 17CS72
  • 4. 1.1 The State of Computing • Early computing was entirely mechanical: – abacus (about 500 BC) – mechanical adder/subtracter (Pascal, 1642) – difference engine design (Babbage, 1827) – binary mechanical computer (Zuse, 1941) – electromechanical decimal machine (Aiken, 1944) • Mechanical and electromechanical machines have limited speed and reliability because of the many moving parts. Modern machines use electronics for most information transmission. Computer Science & Engineering – 17CS72
  • 5. 1.1.1 Computer Development Milestones • Computing is normally thought of as being divided into generations. • Each successive generation is marked by sharp changes in hardware and software technologies. • With some exceptions, most of the advances introduced in one generation are carried through to later generations. • We are currently in the fifth generation. Computer Science & Engineering – 17CS72
  • 6. First Generation (1945 to 1954) • Technology and Architecture – Vacuum tubes and relay memories – CPU driven by a program counter (PC) and accumulator – Machines had only fixed-point arithmetic • Software and Applications – Machine and assembly language – Single user at a time – No subroutine linkage mechanisms – Programmed I/O required continuous use of CPU • Representative systems: ENIAC, Princeton IAS, IBM 701 Computer Science & Engineering – 17CS72
  • 7. Second Generation (1955 to 1964) • Technology and Architecture – Discrete transistors and core memories – I/O processors, multiplexed memory access – Floating-point arithmetic available – Register Transfer Language (RTL) developed • Software and Applications – High-level languages (HLL): FORTRAN, COBOL, ALGOL with compilers and subroutine libraries – Still mostly single user at a time, but in batch mode • Representative systems: CDC 1604, UNIVAC LARC, IBM 7090 Computer Science & Engineering – 17CS72
  • 8. Third Generation (1965 to 1974) • Technology and Architecture – Integrated circuits (SSI/MSI) – Microprogramming – Pipelining, cache memories, lookahead processing • Software and Applications – Multiprogramming and time-sharing operating systems – Multi-user applications • Representative systems: IBM 360/370, CDC 6600, TI ASC, DEC PDP-8 Computer Science & Engineering – 17CS72
  • 9. Fourth Generation (1975 to 1990) • Technology and Architecture – LSI/VLSI circuits, semiconductor memory – Multiprocessors, vector supercomputers, multicomputers – Shared or distributed memory – Vector processors • Software and Applications – Multprocessor operating systems, languages, compilers, and parallel software tools • Representative systems: VAX 9000, Cray X-MP, IBM 3090, BBN TC2000 Computer Science & Engineering – 17CS72
  • 10. Fifth Generation (1990 to present) • Technology and Architecture – ULSI/VHSIC processors, memory, and switches – High-density packaging – Scalable architecture – Vector processors • Software and Applications – Massively parallel processing – Grand challenge applications – Heterogenous processing • Representative systems: Fujitsu VPP500, Cray MPP, TMC CM-5, Intel Paragon Computer Science & Engineering – 17CS72
  • 11. 1.1.2 Elements of Modern Computers • The hardware, software, and programming elements of modern computer systems can be characterized by looking at a variety of factors, including: – Computing problems – Algorithms and data structures – Hardware resources – Operating systems – System software support – Compiler support Computer Science & Engineering – 17CS72
  • 12. Application Software Operating System Hardware Architecture Algorithms and Data Structures Computing Problems High-level Languages Performance Evaluation Mapping Programming Binding (compile,load) Fig 1.1 Elements of a modern computer system Computer Science & Engineering – 17CS72
  • 13. Computing Problems • Numerical computing – complex mathematical formulations – tedious integer or floating-point computation • Transaction processing – accurate transactions – large database management – information retrieval • Logical Reasoning – logic inferences – symbolic manipulations Computer Science & Engineering – 17CS72
  • 14. Algorithms and Data Structures • Traditional algorithms and data structures are designed for sequential machines. • New, specialized algorithms and data structures are needed to exploit the capabilities of parallel architectures. • These often require interdisciplinary interactions among theoreticians, experimentalists, and programmers. Computer Science & Engineering – 17CS72
  • 15. Hardware Resources • The architecture of a system is shaped only partly by the hardware resources. • The operating system and applications also significantly influence the overall architecture. • Not only must the processor and memory architectures be considered, but also the architecture of the device interfaces (which often include their advanced processors). Computer Science & Engineering – 17CS72
  • 16. Operating System • Operating systems manage the allocation and deallocation of resources during user program execution. • UNIX, Mach, and OSF/1 provide support for – multiprocessors and multicomputers – multithreaded kernel functions – virtual memory management – file subsystems – network communication services • An OS plays a significant role in mapping hardware resources to algorithmic and data structures. Computer Science & Engineering – 17CS72
  • 17. System Software Support • Compilers, assemblers, and loaders are traditional tools for developing programs in high-level languages. With the operating system, these tools determine the bind of resources to applications, and the effectiveness of this determines the efficiency of hardware utilization and the system’s programmability. • Most programmers still employ a sequential mind set, abetted by a lack of popular parallel software support. Computer Science & Engineering – 17CS72
  • 18. System Software Support contd.. • Parallel software can be developed using entirely new languages designed specifically with parallel support as its goal, or by using extensions to existing sequential languages. • New languages have obvious advantages (like new constructs specifically for parallelism), but require additional programmer education and system software. • The most common approach is to extend an existing language. Computer Science & Engineering – 17CS72
  • 19. Compiler Support • Preprocessors – use existing sequential compilers and specialized libraries to implement parallel constructs • Precompilers – perform some program flow analysis, dependence checking, and limited parallel optimzations • Parallelizing Compilers – requires full detection of parallelism in source code, and transformation of sequential code into parallel constructs • Compiler directives are often inserted into source code to aid compiler parallelizing efforts Computer Science & Engineering – 17CS72
  • 20. 1.1.3 Evolution of Computer Architecture • Architecture has gone through evolutional, rather than revolutional change. • Sustaining features are those that are proven to improve performance. • Starting with the von Neumann architecture (strictly sequential), architectures have evolved to include processing lookahead, parallelism, and pipelining. Computer Science & Engineering – 17CS72
  • 21. Scalar Sequenti al Lookahead I/E Overlap Functional Parallelism Multiple Func. Units Pipeline Implicit Vector Explicit Vector Memory-to- memory Register- to-register SIMD MIMD Associative Processor Processor Array Multicomputer Mutiprocessor Architectural Evolution Fig 1.2 Tree showing architectural evolution from sequential scalar computers to vector processors and parallel computers Legends: I/E: Instruction Fetch and Execute. SIMD: Single Instruction stream and Multiple Data streams. MIMD: Multiple Instruction steams and Multiple Data Streams. Computer Science & Engineering – 17CS72
  • 22. Flynn’s Classification (Introduced by Michael Flynn in 1972) • Single instruction, single data stream (SISD) – conventional sequential machines • Single instruction, multiple data streams (SIMD) – vector computers with scalar and vector hardware • Multiple instructions, multiple data streams (MIMD) – parallel computers • Multiple instructions, single data stream (MISD) – systolic arrays • Among parallel machines, MIMD is most popular, followed by SIMD, and finally MISD. Computer Science & Engineering – 17CS72
  • 23. Fig 1.3 (a) SISD uniprocessor architecture Processing element (PE) Main memory (M) Instructions Data Control Unit PE Memory PE IS IS DS Computer Science & Engineering – 17CS72
  • 24. Applications: • Image processing • Matrix manipulations • Sorting Fig 1.3 (b) SIMD architecture with distributed memory Computer Science & Engineering – 17CS72
  • 25. SIMD Architectures • Fine-grained – Image processing application – Large number of PEs – Minimum complexity PEs – Programming language is a simple extension of a sequential language • Coarse-grained – Each PE is of higher complexity and it is usually built with commercial devices – Each PE has local memory Computer Science & Engineering – 17CS72
  • 26. Fig 1.3 (c) MIMD architecture (with shared memory) Computer Science & Engineering – 17CS72
  • 27. Fig 1.3(d) MISD architecture (the systolic array) Applications: • Classification • Robot vision Computer Science & Engineering – 17CS72
  • 28. Parallel/Vector Computers • Intrinsic parallel computers execute in MIMD mode. • Two classes: – Shared-memory multiprocessors – Message-passing multicomputers • Processor communication – Shared variables in a common memory (multiprocessor) – Each node in a multicomputer has a processor and a private local memory, and communicates with other processors through message passing. Computer Science & Engineering – 17CS72
  • 29. Pipelined Vector Processors • SIMD architecture • A single instruction is applied to a vector (one-dimensional array) of operands. • Two families: – Memory-to-memory: operands flow from memory to vector pipelines and back to memory – Register-to-register: vector registers used to interface between memory and functional pipelines Computer Science & Engineering – 17CS72
  • 30. SIMD Computers • Provide synchronized vector processing • Utilize spatial parallelism instead of temporal parallelism • Achieved through an array of processing elements (PEs) • Can be implemented using associative memory. Computer Science & Engineering – 17CS72
  • 31. Development Layers (Ni, 1990) • Hardware configurations differ from machine to machine (even with the same Flynn classification) • Address spaces of processors vary among different architectures, and depend on memory organization, and should match target application domain. • The communication model and language environments should ideally be machine-independent, to allow porting to many computers with minimum conversion costs. • Application developers prefer architectural transparency. Computer Science & Engineering – 17CS72
  • 32. Computer Science & Engineering – 17CS72
  • 33. 1.1.4 System Attributes to Performance • Performance depends on – hardware technology – architectural features – efficient resource management – algorithm design – data structures – language efficiency – programmer skill – compiler technology Computer Science & Engineering – 17CS72
  • 34. Performance Indicators • Turnaround time depends on: – disk and memory accesses – input and output – compilation time – operating system overhead – CPU time • Since I/O and system overhead frequently overlaps processing by other programs, it is fair to consider only the CPU time used by a program, and the user CPU time is the most important factor. Computer Science & Engineering – 17CS72
  • 35. Clock Rate and CPI • CPU is driven by a clock with a constant cycle time  (usually measured in nanoseconds). • The inverse of the cycle time is the clock rate (f = 1/, measured in megahertz). • The size of a program is determined by its instruction count, Ic, the number of machine instructions to be executed by the program. • Different machine instructions require different numbers of clock cycles to execute. CPI (cycles per instruction) is thus an important parameter. Computer Science & Engineering – 17CS72
  • 36. Average CPI • It is easy to determine the average number of cycles per instruction for a particular processor if we know the frequency of occurrence of each instruction type. • Of course, any estimate is valid only for a specific set of programs (which defines the instruction mix), and then only if there are sufficiently large number of instructions. • In general, the term CPI is used with respect to a particular instruction set and a given program mix. Computer Science & Engineering – 17CS72
  • 37. Performance Factors (1) • The time required to execute a program containing Ic instructions is just T = Ic  CPI  . • Each instruction must be fetched from memory, decoded, then operands fetched from memory, the instruction executed, and the results stored. • The time required to access memory is called the memory cycle time, which is usually k times the processor cycle time . The value of k depends on the memory technology and the processor-memory interconnection scheme. Computer Science & Engineering – 17CS72
  • 38. Performance Factors (2) • The processor cycles required for each instruction (CPI) can be attributed to – cycles needed for instruction decode and execution (p), and – cycles needed for memory references (m  k). • The total time needed to execute a program can then be rewritten as T = Ic  (p + m  k) . Computer Science & Engineering – 17CS72
  • 39. System Attributes • The five performance factors (Ic , p, m, k, ) are influenced by four system attributes: – instruction-set architecture (affects Ic and p) – compiler technology (affects Ic and p and m) – CPU implementation and control (affects p  ) – cache and memory hierarchy (affects memory access latency, k  ) • Total CPU time can be used as a basis in estimating the execution rate of a processor. Computer Science & Engineering – 17CS72
  • 40. MIPS Rate • If C is the total number of clock cycles needed to execute a given program, then total CPU time can be estimated as T = C   = C / f. • Other relationships are easily observed: – CPI = C / Ic – T =Ic  CPI   – T =Ic  CPI / f • Processor speed is often measured in terms of millions of instructions per second, frequently called the MIPS rate of the processor. Computer Science & Engineering – 17CS72
  • 41. MIPS Rate • The MIPS rate is directly proportional to the clock rate and inversely proportion to the CPI. • All four system attributes (instruction set, compiler, processor, and memory technologies) affect the MIPS rate, which varies also from program to program. 10 10 10 6 6        C I f CPI f T I rate MIPS c c Computer Science & Engineering – 17CS72
  • 42. Throughput Rate • The number of programs a system can execute per unit time, Ws , in programs per second. • CPU throughput, Wp, is defined as CPI I f W c p   • In a multiprogrammed system, the system throughput is often less than the CPU throughput. Computer Science & Engineering – 17CS72
  • 43. Floating Point Operations per Second • Abbreviated as flops. • Used mostly in compute-intensive applications in science and engineering. • With prefix mega (106), giga(109),tera (1012) or peta (1015) written as megaflops(mflops), gigaflops(gflops), teraflops or petaflops. Computer Science & Engineering – 17CS72
  • 44. Example 1. VAX/780 and IBM RS/6000 • The instruction count on the RS/6000 is 1.5 times that of the code on the VAX. • Average CPI on the VAX is assumed to be 5. • Average CPI on the RS/6000 is assumed to 1.39. • VAX has typical CISC architecture. • RS/6000 has typical RISC architecture. Machine Clock Performance CPU Time VAX 11/780 5 MHz 1 MIPS 12x seconds IBM RS/6000 25 MHz 18 MIPS x seconds Computer Science & Engineering – 17CS72
  • 45. Programming Environments • Programmability depends on the programming environment provided to the users. • Conventional computers are used in a sequential programming environment with tools developed for a uniprocessor computer. • Parallel computers need parallel tools that allow specification or easy detection of parallelism and operating systems that can perform parallel scheduling of concurrent events, shared memory allocation, and shared peripheral and communication links. Computer Science & Engineering – 17CS72
  • 46. Implicit Parallelism • Use a conventional language (like C, Fortran, Lisp, or Pascal) to write the program. • Use a parallelizing compiler to translate the source code into parallel code. • The compiler must detect parallelism and assign target machine resources. • Success relies heavily on the quality of the compiler. • Kuck (U. of Illinois) and Kennedy (Rice U.) used this approach. Computer Science & Engineering – 17CS72
  • 47. Explicit Parallelism • Programmer write explicit parallel code using parallel dialects of common languages. • Compiler has reduced need to detect parallelism, but must still preserve existing parallelism and assign target machine resources. • Seitz (Cal Tech) and Daly (MIT) used this approach. Computer Science & Engineering – 17CS72
  • 48. Computer Science & Engineering – 17CS72
  • 49. Needed Software Tools • Parallel extensions of conventional high-level languages. • Integrated environments to provide – different levels of program abstraction – validation, testing and debugging – performance prediction and monitoring – visualization support to aid program development, performance measurement – graphics display and animation of computational results Computer Science & Engineering – 17CS72
  • 50. Thank you Computer Science & Engineering – 17CS72