CS 426 – Parallel Computing


Lecture Contents (Tentative)

Begins

Lecture Contents

Lecture Slides

Feb 4th

Introduction

  • Motivating Parallelism
  • Scope of Parallel Computing
  • Organization and Contents of the Text

 Set1

Feb 11th

Parallel Programming Platforms

  • Implicit Parallelism: Trends in Microprocessor Architectures
  • Limitations of Memory System Performance

 Set2

Feb 18th

Parallel Programming Platforms

  • Dichotomy of Parallel Computing Platforms
  • Physical Organization of Parallel Platforms Communication Costs in Parallel Machines
  • Routing Mechanisms for Interconnection Networks

 Set3

 Set4

Feb 25th

Basic Communication Operations

  • One-to-All Broadcast and All-to-One Reduction
  • All-to-All Broadcast and Reduction
  • All-Reduce and Prefix-Sum Operations
  • Scatter and Gather
  • All-to-All Personalized Communication
  • Circular Shift
  • Improving the Speed of Some Communication Operations

 Set5

Mar 4th

Principles of Parallel Algorithm Design

  • Decomposition Techniques

  

Mar 11th

Principles of Parallel Algorithm Design

  • Characteristics of Tasks and Interactions
  • Mapping Techniques for Load Balancing

Set6

Project 1 out

Mar 18th

Principles of Parallel Algorithm Design

  • Methods for Containing Interaction Overheads
  • Parallel Algorithm Models

HW1 out

Mar 25th

Programming Using the Message Passing Paradigm

  • Principles of Message-Passing Programming
  • The Building Blocks: Send and Receive Operations
  • MPI: The Message Passing Interface

Project 1 in

HW1 in

Apr 1st

Midterm – April 5th, Friday during regular class hours

Programming Using the Message Passing Paradigm

  • Topologies and Embedding
  • Overlapping Communication with Computation
  • Collective Communication and Computation Operations
  • Groups and Communicators

Set7

Midterm

Project 2 out

Apr 8th

Analytical Modeling of Parallel Programs

  • Sources of Overhead in Parallel Programs
  • Performance Metrics for Parallel Systems
  • Effect of Granularity and Data Mapping on Performance
  • Scalability of Parallel Systems
  • Minimum Execution Time and Minimum Cost-Optimal Execution Time

Parallel computing kernels

  • Matrix transposition
  • Matrix-vector multiplication
  • Matrix-matrix multiplication
  • Matrix partitioning schemes for load-balancing and communication minimization

 

Apr 15th

Programming Shared Address Space Platforms

  • Thread Basics
  • Tips for Designing Asynchronous Programs
  • History of OpenMP

 

Set8

Project 2 in

Project 3 out

Apr 22nd

{22 & 23April Mon. & Tues holiday!}

Multicore programming

  • Compiling and Running OpenMP programs
  • Shared Memory Systems
  • Concepts in OpenMP

Set9

Apr 29th

{1st May, Wed. holiday!}

GPU programming

  • Hardware Overview
  • Performance

Project 3 in

Project 4 out

Set10

May 6th

GPU programming

  • Software Environment – Programming Models
  • GPU Memory
  • CUDA Programming Model

Set11 

May 13th

GPU programming

  • CUDA Programming Model
  • OpenCL – Open Computing Language

 

Wednesday, May 15th is the last day of classes

Project 4 in

Set12

May 20th

Final – May 25th 9:00AM – 12:00AM