File Name: introduction to parallel computing design and analysis of algorithms .zip
- The Center for Education and Research in Information Assurance and Security (CERIAS)
- Analysis of parallel algorithms
- Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)
In computer science, the analysis of parallel algorithms is the process of finding the computational complexity of algorithms executed in parallel — the amount of time, storage, or other resources needed to execute them. In many respects, analysis of parallel algorithms is similar to the analysis of sequential algorithms , but is generally more involved because one must reason about the behavior of multiple cooperating threads of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources speed, space, etc changes as the number of processors is changed.
The Center for Education and Research in Information Assurance and Security (CERIAS)
An algorithm is a sequence of steps that take inputs from the user and after some computation, produces an output. A parallel algorithm is an algorithm that can execute several instructions simultaneously on different processing devices and then combine all the individual outputs to produce the final result.
The easy availability of computers along with the growth of Internet has changed the way we store and process data. We are living in a day and age where data is available in abundance. Every day we deal with huge volumes of data that require complex computing and that too, in quick time. Sometimes, we need to fetch data from similar or interrelated events that occur simultaneously.
This is where we require concurrent processing that can divide a complex task and process it multiple systems to produce the output in quick time. Concurrent processing is essential where the task involves processing a huge bulk of complex data. Parallelism is the process of processing several set of instructions simultaneously.
It reduces the total computational time. Parallelism can be implemented by using parallel computers, i. Parallel computers require parallel algorithm, programming languages, compilers and operating system that support multitasking. In this tutorial, we will discuss only about parallel algorithms. Before moving further, let us first discuss about algorithms and their types. An algorithm is a sequence of instructions followed to solve a problem.
While designing an algorithm, we should consider the architecture of computer on which the algorithm will be executed. Later on, these individual outputs are combined together to get the final desired output.
It is not easy to divide a large problem into sub-problems. Sub-problems may have data dependency among them. Therefore, the processors have to communicate with each other to solve the problem. It has been found that the time needed by the processors in communicating with each other is more than the actual processing time. So, while designing a parallel algorithm, proper CPU utilization should be considered to get an efficient algorithm. To design an algorithm properly, we must have a clear idea of the basic model of computation in a parallel computer.
Both sequential and parallel computers operate on a set stream of instructions called algorithms. These set of instructions algorithm instruct the computer about what it has to do in each step. In this type of computers, the processor receives a single stream of instructions from the control unit and operates on a single stream of data from the memory unit.
During computation, at each step, the processor receives one instruction from the control unit and operates on a single data received from the memory unit. SIMD computers contain one control unit, multiple processing units, and shared memory or interconnection network. Here, one single control unit sends instructions to all processing units. During computation, at each step, all the processors receive a single set of instructions from the control unit and operate on different set of data from the memory unit.
Each of the processing units has its own local memory unit to store both data and instructions. In SIMD computers, processors need to communicate among themselves. This is done by shared memory or by interconnection network.
While some of the processors execute a set of instructions, the remaining processors wait for their next set of instructions. Instructions from the control unit decides which processor will be active execute instructions or inactive wait for next instruction. As the name suggests, MISD computers contain multiple control units, multiple processing units, and one common memory unit.
Here, each processor has its own control unit and they share a common memory unit. All the processors get instructions individually from their own control unit and they operate on a single stream of data as per the instructions they have received from their respective control units. This processor operates simultaneously. MIMD computers have multiple control units, multiple processing units, and a shared memory or interconnection network.
Here, each processor has its own control unit, local memory unit, and arithmetic and logic unit. They receive different sets of instructions from their respective control units and operate on different sets of data. An MIMD computer that shares a common memory is known as multiprocessors, while those that uses an interconnection network is known as multicomputers. Analysis of an algorithm helps us determine whether the algorithm is useful or not. Generally, an algorithm is analyzed based on its execution time Time Complexity and the amount of space Space Complexity it requires.
Since we have sophisticated memory devices available at reasonable cost, storage space is no longer an issue.
Hence, space complexity is not given so much of importance. Parallel algorithms are designed to improve the computation speed of a computer. The main reason behind developing parallel algorithms was to reduce the computation time of an algorithm.
Thus, evaluating the execution time of an algorithm is extremely important in analyzing its efficiency. Execution time is measured on the basis of the time taken by the algorithm to solve a problem. The total execution time is calculated from the moment when the algorithm starts executing to the moment it stops.
If all the processors do not start or end execution at the same time, then the total execution time of the algorithm is the moment when the first processor started its execution to the moment when the last processor stops its execution. The complexity or efficiency of an algorithm is the number of steps executed by the algorithm to get the desired output.
Asymptotic analysis is done to calculate the complexity of an algorithm in its theoretical analysis. In asymptotic analysis, a large length of input is used to calculate the complexity function of the algorithm.
Here the line and the curve is asymptotic to each other. Asymptotic notation is the easiest way to describe the fastest and slowest possible execution time for an algorithm using high bounds and low bounds on speed. In mathematics, Big O notation is used to represent the asymptotic characteristics of functions. It represents the behavior of a function for large inputs in a simple and accurate method. It represents the longest amount of time that the algorithm could take to complete its execution.
The performance of a parallel algorithm is determined by calculating its speedup. Speedup is defined as the ratio of the worst-case execution time of the fastest known sequential algorithm for a particular problem to the worst-case execution time of the parallel algorithm.
The number of processors used is an important factor in analyzing the efficiency of a parallel algorithm. The cost to buy, maintain, and run the computers are calculated. Larger the number of processors used by an algorithm to solve a problem, more costly becomes the obtained result.
Total cost of a parallel algorithm is the product of time complexity and the number of processors used in that particular algorithm. The model of a parallel algorithm is developed by considering a strategy for dividing the data and processing method and applying a suitable strategy to reduce interactions.
In data parallel model, tasks are assigned to processes and each task performs similar types of operations on different data. Data parallelism is a consequence of single operations that is being applied on multiple data items. Data-parallel model can be applied on shared-address spaces and message-passing paradigms.
In data-parallel model, interaction overheads can be reduced by selecting a locality preserving decomposition, by using optimized collective interaction routines, or by overlapping computation and interaction. The primary characteristic of data-parallel model problems is that the intensity of data parallelism increases with the size of the problem, which in turn makes it possible to use more processes to solve larger problems.
In the task graph model, parallelism is expressed by a task graph. A task graph can be either trivial or nontrivial.
In this model, the correlation among the tasks are utilized to promote locality or to minimize interaction costs. This model is enforced to solve problems in which the quantity of data associated with the tasks is huge compared to the number of computation associated with them.
The tasks are assigned to help improve the cost of data movement among the tasks. Here, problems are divided into atomic tasks and implemented as a graph. Each task is an independent unit of job that has dependencies on one or more antecedent task. After the completion of a task, the output of an antecedent task is passed to the dependent task. A task with antecedent task starts execution only when its entire antecedent task is completed. The final output of the graph is received when the last dependent task is completed Task 6 in the above figure.
In work pool model, tasks are dynamically assigned to the processes for balancing the load. Therefore, any process may potentially execute any task.
This model is used when the quantity of data associated with tasks is comparatively smaller than the computation associated with the tasks. There is no desired pre-assigning of tasks onto the processes. Assigning of tasks is centralized or decentralized. Pointers to the tasks are saved in a physically shared list, in a priority queue, or in a hash table or tree, or they could be saved in a physically distributed data structure.
The task may be available in the beginning, or may be generated dynamically. If the task is generated dynamically and a decentralized assigning of task is done, then a termination detection algorithm is required so that all the processes can actually detect the completion of the entire program and stop looking for more tasks.
In the master-slave model, one or more master processes generate task and allocate it to slave processes. This model is generally equally suitable to shared-address-space or message-passing paradigms, since the interaction is naturally two ways. In some cases, a task may need to be completed in phases, and the task in each phase must be completed before the task in the next phases can be generated. The master-slave model can be generalized to hierarchical or multi-level master-slave model in which the top level master feeds the large portion of tasks to the second-level master, who further subdivides the tasks among its own slaves and may perform a part of the task itself.
Care should be taken to assure that the master does not become a congestion point. It may happen if the tasks are too small or the workers are comparatively fast. The tasks should be selected in a way that the cost of performing a task dominates the cost of communication and the cost of synchronization. Asynchronous interaction may help overlap interaction and the computation associated with work generation by the master.
Analysis of parallel algorithms
Since the release of the text "Introduction to Parallel Computing: Design and Analysis of. Algorithms" by the same authors, the field of parallel computing.
Algorithms and Parallel Computing (Wiley Series on Parallel and Distributed Computing)
Please note that you can subscribe to a maximum of 2 titles. Book Details. Introduction to Parallel Computing, 2e provides a basic, in-depth look at techniques for the design and analysis of parallel algorithms and for programming them on commercially available parallel platforms.
Goodreads helps you keep track of books you want to read. Want to Read saving…. Want to Read Currently Reading Read.
Design and Analysis of Distributed Algorithms (Wiley Series on Parallel and Distributed Computing)
This Book provides an clear examples on each and every topics covered in the contents of the book to provide an every user those who are read to develop their knowledge. The reason is the electronic devices divert your attention and also cause strains while reading eBooks. Advancements in microprocessor architecture, interconnection technology, and software development have fueled rapid growth in parallel and distributed computing. However, this development is only of practical benefit if it is accompanied by progress in the design, analysis and programming of parallel algorithms. This concise textbook provides, in one place, three mainstream parallelization approaches, Open MPP, MPI and OpenCL, for multicore computers, interconnected computers and graphical processing units. An overview of practical parallel computing and principles will enable the reader to design efficient parallel programs for solving various computational problems on state-of-the-art personal computers and computing clusters. Step-1 : Read the Book Name and author Name thoroughly.
Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly. Kumar and A. Grama and A. Gupta and G.
An algorithm is a sequence of steps that take inputs from the user and after some computation, produces an output. A parallel algorithm is an algorithm that can execute several instructions simultaneously on different processing devices and then combine all the individual outputs to produce the final result. The easy availability of computers along with the growth of Internet has changed the way we store and process data. We are living in a day and age where data is available in abundance. Every day we deal with huge volumes of data that require complex computing and that too, in quick time. Sometimes, we need to fetch data from similar or interrelated events that occur simultaneously. This is where we require concurrent processing that can divide a complex task and process it multiple systems to produce the output in quick time.