Stewart Weiss
Copyright 2021-24 Stewart Weiss. Unless noted otherwise all content is released under a Creative Commons Attribution-ShareAlike 4.0 International License. Background image: George Washington Bridge at Dusk, by Stewart Weiss.
© Stewart Weiss. CC-BY-SA.
Things that operate in parallel but that work together must occasionally communicate to achieve their goals, such as
computers working together
processes working together
people working together
When a system of parallel and discrete entities work together and communicate with each other, they form a network.
This chapter looks at the various ways in which components of parallel systems can be connected.
© Stewart Weiss. CC-BY-SA.
Imagine three processes that work together. One reads a file containing numbers whose sum is to be calculated.
It sends half the file to a second process and the remaining half to a third process.
These two processes add their numbers independently and send their sums to the first process, which adds them and prints them out.
If we call them processes 1, 2, and 3 and represent them as nodes, their communication pattern looks like this:
1 and 2 communicate, and 1 and 3 communicate, but not 2 and 3.
© Stewart Weiss. CC-BY-SA.
The preceding example illustrates a network topology.
A network topology formalizes the way in which a set of nodes are "connected" to each other.
A network topology is essentially a discrete graph : a set of nodes connected by edges.
Edges may or may not have direction. If information can flow back and forth, they are undirected, but if it flows only in a single direction it is directed and is often drawn with an arrow.
Distance does not exist - edges do not have length.
A network topology is a type of mathematical topological space.
© Stewart Weiss. CC-BY-SA.
Suppose that in the preceding example, process 2 sends half its data to processes 4 and 5, and process 3 sends half its data to 6 and 7.
Each of 4,5,6, and 7 add up their numbers independently. Processes 4 and 5 send their sums to 2, and 6 and 7 send theirs to 3.
Then processes 2 and 3 send their sums to 1, which sums them and prints them out. The communication pattern looks like this:
© Stewart Weiss. CC-BY-SA.
A path from a node S to a node T is a sequence of nodes S=N0, N1, N2, ..., Nk = T such that there is an edge from Ni to Ni+1 for each i.
The distance between two nodes S and T is the length of the shortest path between them.
The degree of a node is the number of edges that are incident to that node, whether they are incoming or outgoing or undirected.
© Stewart Weiss. CC-BY-SA.
Properties of a topology affect performance, scalability, cost, and feasibility of using it.
Size This is the number of nodes in the topology.
Diameter: This is the maximum distance between any two nodes in the topology.
Bisection Width This is the number of edges that must be removed so that the vertex set is split into two equal size sets of vertices, or if it has an add number to start, then two sets whose sizes differ by 1.
Degree This is maximum degree of any node in the topology.
Although in mathematical topoloogies, edges do not have length, another proerty of topologies that describe physical networks such as CPUs on a chip or board, or computers connected together, is the maximum length of any edges among them.
© Stewart Weiss. CC-BY-SA.
Certain regular topologies arise as a result of algorithms or design decisions that are made in parallel systems.
Some that we study include:
Binary tree
Fully-connected (also called completely-connected)
Mesh and torus
Hypercube (also called a binary n-cube)
Butterfly
© Stewart Weiss. CC-BY-SA.
In a binary tree network, the number of nodes, N, is equal to 2^k -1 for some k, and these nodes are arranged in a complete binary tree of depth k-1.
They have sizes 1, 3, 7, 15, 31, and so on.
Rarely used as a physical network among cores.
Mostly occurs when tasks communicate using certain types of algoriithms, such as our earlier example.
© Stewart Weiss. CC-BY-SA.
Every node is connected to every other node.
Rarely used to connect cores,
Often represents possible connections between tasks or processes.
© Stewart Weiss. CC-BY-SA.
A mesh network looks like a grid in two dimensions, or a wire mesh cube in three dimensions, but is harder to visualize when there are more than 3 dimensions.
Formally, it is called a lattice.
Does not have to have same size in each dimension.
© Stewart Weiss. CC-BY-SA.
When a mesh has two dimensions, if we connect nodes at top and bottom by edges, and nodes on left and right by edges, we have a torus. It looks like a "doughnut".
© Stewart Weiss. CC-BY-SA.
A binary n-cube, also called a hypercube, network is a network with N = 2^n nodes arranged as the vertices of a n-dimensional cube.
A hypercube is simply a generalization of an ordinary cube.
In a 3D cube, there are N = 8 = 2^3 nodes, each connected to 3 nodes, one in each dimension.
A square is a 2D cube. It has N = 4 = 2^2 nodes, each connected to 2 nodes, one in each dimension.
In general, in an n-cube, each node is connected to n other nodes, ome in each dimension.
This is a 4-cube:
© Stewart Weiss. CC-BY-SA.
A butterfly network topology consists of (k+1)2^k nodes arranged in k+1 ranks (i.e., rows), each containing n=2^k nodes. k is called the order of the network.
The ranks are labeled 0 through k. The columns in the figure are labeled 0 through 2^k.
A butterfly of order 0 has 1 node.
A butterfly of order 1 has 4 nodes - 2 rows with 2 nodes each.
A butterfly of order 2 has 12 nodes - 3 rows with 4 nodes each.
© Stewart Weiss. CC-BY-SA.
Each node is connected to the node above it and below it in its column.
Nodes are also connected to nodes not in their columns.
It is easier to see how this works if you write numbers in binary using k bits for order k. Consider k=3= 011. Node [1,3] is [001,011]. 2^(k-i-1) = 2^(3-1-1) = 2= 010. The bitwise xor of j = 3 = 011 and 010 is 001 = 1, so [001,011] has an edge to [010,001] = [2,1].
There is a way to "grow" butterfly networks recursively. See the lecture notes.
© Stewart Weiss. CC-BY-SA.
An interconnection network is a system of links that connects one or more devices to each other for the purpose of inter-device communication.
In the context of computer architecture, an interconnection network is used primarily to connect processors to processors, or to allow multiple processors to connect to one or more shared memory modules (as in SMPs.)
Sometimes they are used to connect processors with locally attached memories to each other. The interconnection-network has a significant effect on the cost, applicability, scalability, reliability, and performance of a parallel computer.
© Stewart Weiss. CC-BY-SA.
An interconnection network may be classified as shared or switched.
A shared network can have at most one message on it at any time. For example, a bus is a shared network, as is traditional Ethernet.
A switched network allows point-to-point messages among pairs of nodes and therefore supports the transfer of multiple concurrent messages. It is a collection of interconnected switches. Switched Ethernet is a switched network.
Shared networks are inferior to switched networks in terms of performance and scalability. In the example to the left, the switched network allows two links simultaneously.
© Stewart Weiss. CC-BY-SA.
Because interconnection networks can connect different types of devices, they have been used in different ways.
In one approach, each node is a switch, and exactly one device is connected to that switch. This is called a direct topology. The mesh to the right is a direct topology.
The convention is that switches are drawn with circles and devices such as processors, computers, or memories, are drawn with squares.
© Stewart Weiss. CC-BY-SA.
Because interconnection networks can connect different types of devices, they have been used in different ways.
In another approach, called an indirect topology, the number of switches is greater than the number of device nodes. The switches are used for routing messages from one processor or device to another. The binary tree to the right is used as a switching network in which the processors are connected only to the leaf nodes.
© Stewart Weiss. CC-BY-SA.
Meshes are almost always used as a direct topology, with a processor attached to each switch.
Binary trees are always indirect topologies, acting as a switching network to connect a bank of proces- sors to each other.
© Stewart Weiss. CC-BY-SA.
A multiprocessor is a computer with multiple CPUs and a shared address space.
Most multiprocessors are one of two types:
those in which the shared memory is physically in one place, and
those in which it is distributed among the processors.
Regardless of where physical memory is located, parallel programs running on multiprocessors can take advantage of shared memory.
© Stewart Weiss. CC-BY-SA.
When the shared memory is a physically central memory in one place, it is called any of:
Schematically, it looks like this:
In more detail, like this:
© Stewart Weiss. CC-BY-SA.
If a multiprocessor does not have a centralized memory, but instead has physical memory distributed across the separate CPUs, it is called either:
distributed multiprocessor, or a
Non-Uniform Memory Access (NUMA) multiprocessor
The CPUs still can access a shared set of addresses, but how they do it is different than if the memory were centralized.
Schematically, a NUMA multiprocessor looks like this:
© Stewart Weiss. CC-BY-SA.
The important distinction between a multiprocessor and a multicomputer is that a multicomputer is a distributed memory, multiple-CPU computer in which there is no shared address space among the separate CPUs.
Each CPU has its own address space and can access only its own local memory, which is called private memory.
The same address on two different CPUs refers to two different memory locations.
Processes running on multicomputers must share data through a message-passing interface; they cannot access the data that is used by the other programs.
© Stewart Weiss. CC-BY-SA.
A commercial multicomputer is a multicomputer designed, manufactured, and intended to be sold as a multicomputer.
A commodity cluster is a multicomputer put together out of off-the-shelf components to create a multicomputer.
A commercial multicomputer's interconnection network and processors are optimized to work with each other, providing low-latency, high-bandwidth connections between the computers, at a higher price tag than a commodity cluster.
Commodity clusters, though, generally have lower performance, with higher latency and lower bandwidth in the interprocessor connections.
© Stewart Weiss. CC-BY-SA.
Some multicomputers, called asymmetrical multicomputers, are designed with a special front-end computer and back-end computers.
The front-end is the master and gateway for the machine, and the back-end CPUs are for computation. Users login to the front-end and their jobs are run on the back-end processors, and all I/O takes place through the front-end machine.
© Stewart Weiss. CC-BY-SA.
A symmetrical multicomputer is one in which all of the hosts are identical and are connected to each other through an interconnection network.
Users can login to any host and the file system and I/O devices are equally accessible from every host.
© Stewart Weiss. CC-BY-SA.
Flynn (1966) categorized parallel hardware based upon a classification scheme with two orthogonal parameters: the instruction stream and the data stream.
In Flynn's Taxonomy, a machine is classified by whether it has a single or multiple instruction streams, and whether it has single or multiple data streams.
© Stewart Weiss. CC-BY-SA.
There are four possibilities:
SISD: single instruction, single data;
SIMD: single instruction, multiple data;
MISD: multiple instruction, single data;
MIMD: multiple instruction, multiple data; SMPs, clusters.
© Stewart Weiss. CC-BY-SA.
Parallel programming models are an abstraction that is independent of the actual hardware. They describe how tasks (processes or threads) can interact with each other and share data.
Several different models:
Shared memory without threads (processes arrange sharing of memory)
Shared memory with threading (Pthreads, OpenMP)
Distributed memory / Message Passing (MPI)
Data parallel hybrid
Single Program Multiple Data (SPMD)
Multiple Program Multiple Data (MPMD)
© Stewart Weiss. CC-BY-SA.
POSIX Threads (Pthreads)
OpenMP
© Stewart Weiss. CC-BY-SA.
Distributed memory / Message Passing
© Stewart Weiss. CC-BY-SA.
Things that operate in parallel but that work together must occasionally communicate to achieve their goals, such as
computers working together
processes working together
people working together
When a system of parallel and discrete entities work together and communicate with each other, they form a network.
This chapter looks at the various ways in which components of parallel systems can be connected.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |