About

This course is a graduate-level introduction to parallel computing. Its goal is to give you the foundations to develop, analyze, and implement parallel and locality-efficient algorithms and data structures.

Access to course materials

The course appears as “CSE 6220-O01” in OSCAR.

What you’ll learn

When we talk about scalability, “scale” has two senses: efficiency as the problem size grows and efficiency as the system size—as measured by the numbers of cores or compute nodes—grows.

To really scale your algorithm in both senses, you can start by reducing asymptotic complexity (remember your days as a CS 101 n00b?). But then you also need to reduce communication and data movement. This course is about the basic algorithmic techniques you’ll need, especially for the latter.

The course videos focus on theory. But that only gets you so far. So, you will need to supplement this algorithmic theory with hands-on labs on parallel shared-memory and distributed-memory machines; otherwise, how will you know if something that works in theory also works well in practice?

The specific techniques you will encounter cover the main algorithm design and analysis ideas for three major classes of machines:

  1. sequential or parallel machines with deep memory hierarchies (e.g., caches), via external memory models;
  2. multicore and manycore shared memory machines, via the work-span (or “multithreaded DAG”) model;
  3. and distributed memory machines like clusters and supercomputers, via network models.

You will see these techniques applied to fundamental problems, like sorting, searching on trees and graphs, and linear algebra, among others. You will implement several of them on real parallel and distributed systems, using practical programming models such as Cilk Plus, OpenMP, MPI, or possibly others.

Why? (Motivation)

The “standard” undergraduate CS curriculum begins, and usually ends, with the sequential (or serial) random access machine (“RAM”) model. This course helps to fill the gap between algorithm design for serial RAM machines and real machines, which will always have multiple cores, multiple nodes, vector units, and deep memory hierarchies.

HPC vs. HPCA? This course complements CS 6290 / d007: High-Performance Computer Architecture. The main difference between this course (“HPC”) and that course (“HPCA”) is that HPC is about algorithms and HPCA is about hardware (primarily, microprocessors).

So, take both and crush it like nobody’s business!

In retrospect, I guess we should have been called these courses “HPC-A” for algorithms and “HPC-H” for hardware (or “HPC-M” for machines). But that would have been less confusing, and where is the fun in that?

What do I need to know? (Prerequisites)

You should be comfortable designing and analyzing basic sequential algorithms and data structures using “big-Oh-my” notation. That’s the stuff you learned in a first or second course in algorithms and data structures, a la Georgia Tech’s CS 3510 or Udacity’s Intro to Algorithms.

You should also be comfortable programming in C or C++ for the programming assignments. Experience using command-line interfaces in *nix environments (e.g., Unix, Linux) is also helpful, though there are software carpentry materials that may help with that, a la https://software-carpentry.org, for instance.

Please also review the course readiness survey for CSE 6220. It includes some topics in calculus, linear algebra, probability, and algorithms, which are required in many undergraduate CS programs. It’s not so much that you need to remember how to solve exactly those sorts of problems without looking things up. Rather, you should feel confident that you can make time to brush-up as needed. Being able to do so is strongly correlated with the level of mathematical maturity and independence necessary for this class.

Lastly, check out the reviews of this course, in which alumna tell you what they think you actually need to succeed beyond what I’ve spelled out here. https://omscentral.com/courses

What else? (Topics)

We’ve organized the course topics around three different ideas or extensions to the usual serial RAM model of CS 101.

Serial RAM from CS 101

Recall that a serial RAM assumes a sequential or serial processor connected to a main memory.

Unit 1: Two-level memory or I/O models.

Instead of having only a processor connected to a main memory, suppose there is a smaller but faster “scratchpad” memory in between the two. The algorithmic question here is how to use the scratchpad to minimize costly data transfers from main memory.

I/O model

Sub-topics include:

  • Basic models
  • Efficiency metrics, including “emerging” metrics like energy and power
  • I/O-aware algorithms
  • Cache-oblivious algorithms

Unit 2: The work-span or dynamic multithreading model for shared-memory machines.

This model starts from the serial RAM and assumes that there is now more than one processor connected to the main memory. Since they can all “see” the same memory, the processors can coordinate and communicate via reads and writes to that “shared” memory.

PRAM

Sub-topics include:

  • Intro to the basic algorithmic model
  • Intro to OpenMP, a practical programming model
  • Comparison-based sorting algorithms
  • Scans and linked list algorithms
  • Tree algorithms
  • Graph algorithms, e.g., breadth-first search

Unit 3: Distributed memory or network models.

This model says there is not one serial RAM, but many serial RAMs connected by a network. Each serial RAM’s memory is private to other RAMs; consequently, the processors must coordinate and communicate by sending and receiving messages. Sort of like passing notes to that other kid you had a crush on in school.

Network model

Sub-topics include:

  • The basic algorithmic model
  • Intro to the Message Passing Interface, a practical programming model
  • Reasoning about the effects of network topology
  • Dense linear algebra
  • Sorting
  • Sparse graph algorithms
  • Graph partitioning

Combinations!

Of course, a real HPC cluster or supercomputer system is a combination of all of these models! For the most ambitious among you, the final assignment of the course will give you an opportunity to combine as many of these techniques as you can.

Who? (Teaching Staff)

  • Main instructor: Andrew Becker (OMSCS and CSE 6220-OMS alumnus extraordinaire)
  • Head Teaching Assistant: Prof. Richard (Rich) Vuduc (Yup, the guy who created the course has since been demoted 😂)
  • Teaching assistants: Avik Samanta, Hal Elrod, Jing-Jing Li, JR Smith, Kristina Theroux, Pars Piroozi, Tim Hu, and Yuanhao Wang
  • Course co-developers (Udacity, though no longer affiliated therewith): Amanda Deisler

Where? (Logistics)

Your first and best source of questions and answers is the class discussion forum. We make all critical announcements there, including when assignments go out, reminders about when they are due, clarification of various policies, errata, and hints.

The teaching staff strongly prefers that, if you need to contact them, you post a private note to the instructors on Ed Discussions rather than sending email. (You’ll get a much faster response by doing so.)

For more tips on using the forums effectively, see these notes.

Office hours. We will hold weekly office hours and record them for students that are unable to attend. You are responsible for information conveyed in them. Please see the class forums for details.

School Supplies

Minimum Technical Requirements.

  • Pencil, paper, and brain!
  • Browser and connection speed: We strongly recommend an up-to-date version of Chrome or Firefox. We also support Internet Explorer 9 and the desktop versions of Internet Explorer 10 and above (not the metro versions). 2+ Mbps recommended; at minimum 0.768 Mbps download speed
  • Operating System: The following are the recommended operating systems for the course. We may also elect to provide virtual machines with a standardized environment, though most of the assignments can be completed by directly logging into the HPC resource we will provide via secure shell (ssh).
    • PC: Windows XP or higher with latest updates installed
    • Mac: OS X 10.6 or higher with latest updates installed
    • Linux: Any recent distribution that has the supported browsers installed.

Textbook and readings. There will be supplemental readings, provided via papers and the following textbook (available electronically to Georgia Tech students for free):

How? (Assignments and Exams)

Beyond the videos, there will be supplemental readings and a set of programming assignments. See the schedule for a list of assignments, their due dates, and their associated lessons.

Due date convention. For all labs, the posted due date should be interpreted as “23:59 anywhere on earth” (11:59 pm AOE). For example, if an assignment is due on January 31, as long as there is any place on the planet Earth where it is 11:59 pm or earlier, your submission is on time. (You are responsible for taking signal transmission delays into account, especially if you are connecting from outer space.)

For the ultimate in precision timing, here’s a handy AOE clock: http://www.timeanddate.com/time/zones/aoe

Grading. For each programming assignment you submit, we will check your implementation against some test cases for correctness and we will also measure how fast your implementation is. The minimum grade for a correct (but possibly slow) implementation is roughly equivalent to a ‘C’; to get a higher grade, your implementation should also be fast, with a ‘B’ meaning reasonable performance and ‘A’ for high performance.

But how do you know whether your implementation is slow or fast? We will post guidelines with each assignment.

Late policy. You get one “late pass” – that is, for any programming assignment, you may submit the assignment up to 48 hours after the official due date without penalty. Any assignment submitted after you’ve used your one pass or after 48 hours (with or without a pass) will get zero credit. We have to enforce a hard limit so that we can grade the assignments and return results within a reasonable timeframe.

Late passes may not be used for exams. No late exams will be graded or accepted.

Collaboration policy. All Georgia Tech students are expected to uphold the Georgia Tech Academic Honor Code. Honest and ethical behavior is expected at all times. All incidents of suspected dishonesty will be reported to and handled by the Dean of Students office. Penalties for violating the collaboration policy can be severe; alleged violations are adjudicated by the Dean of Students office and not by the instructor.

Collaboration on assignments is encouraged at the “whiteboard” level. That is, you may share ideas and have technical conversation, which we especially encourage on the class forums, but you must write and submit your own code.

Exams must be done individually but are open-book and “open-internet” (for looking up information only). That is, while taking the exam you are not allowed to use the internet or to actively get help by, for instance, posting questions or messaging with others. We will use Honorlock to administer the exams. (The students shown on the Honorlock website all look so happy, so it must be good, right?)

We require you to complete a quiz each term acknowledging you have read and understand these policies.

Regrade policy. We’ll be happy to review your submissions and recalculate your grade, provided your request is specific and you submit your request (as a private Ed Discussions post) within 14 days of grades being released. The caveat with this though is your score has a possibility of going down if your time/speedup gets worse. If the assignment has a reference serial execution time, it will be recalculated at the time of regrading to ensure consistency.

The above holds similarly when asking for regrades on exam questions. If you ask us to regrade a question and we notice something that should have been deducted, but wasn’t, then we reserve the right to adjust your score accordingly.

Regrades do not apply to performance scores for projects where you are scaled against other students.

Anything else? (Miscellaneous)

Schedule

Follow this link to see the overall schedule of events, which will appear in a Google Sheet.

On that sheet, the percentages you see attached to various assignments indicate their weight in determining your final grade. You are responsible for pacing yourself, but please use your peers and Piazza to keep tabs on where the class is with respect to the material. We will try to post a “Recommended Progress” note on Piazza each week to help remind you of where you should be.

Due date convention. For all labs, the posted due date should be interpreted as “23:59 anywhere on earth” (11:59 pm AOE). For example, if an assignment is due on January 29, as long as there is any place on the planet Earth where it is 11:59 pm or earlier, your submission is on time. (You are responsible for taking signal transmission delays into account, especially if you are connecting from outer space.)

For the ultimate in precision timing, here’s a handy AOE clock: http://www.timeanddate.com/time/zones/aoe

Labs

The following is a list of lab assignments for this class. For a reminder of all due dates and associated lessons, please refer to the schedule. Remember that the due time is always 23:59 AOE – we’ve included “Time Zone Converter” links below, but you are responsible for mastering this convention! Lastly, be mindful of time changes due to Daylight Savings Time!

Lab 0 (10%)     – due Mon Sep 20 AOE

Lab 1 (15%)     – due Mon Oct 11 AOE

Lab 2 (20%)     – due Mon Nov 8 AOE

Lab 3 (20%)     – due Mon Nov 29 AOE