Tuesday, May 19, 2015

Designing an Undergraduate Class on High Performance Computing

Since the advent of multicore processors in the mid 2000s, parallel architectures have become mainstream. These days, it is extremely difficult to find a single-core processor, including those on cell phones, tablets, and laptops. Therefore, there is a growing interest in training students on programming for parallel architectures. In particular, a class on high-performance computing (HPC) is getting more popularity in universities. Part of this interest comes from the fact that HPC almost equates with parallel computing.

Here are some of the challenges on designing an undergraduate-level class on HPC in Latin America:
  • List of Topics. This is probably the first big decision that has to be made. Which topics are in, which are out? The answer depends on many factors, ranging from the expertise of the instructor, the availability of resources, and the background of students. As a general guide, a class on HPC should target junior and senior students (3rd and 4th year into the program) of computer science or computer engineering. That's a safe move to guarantee students have enough background on computer science concepts (algorithms, systems, programming). The recommendation is to include at least these topics: architecture, interconnects, performance models, parallel algorithms, shared-memory programming, distributed-memory programming, accelerator programming, and some sort of parallel programming patterns.
  • Programming Languages. The point above listed three programming platforms. There should be at least one programming language for each. The recommendation is to use open-source compilers. A safe bet is to go for C/C++, a language most students are familiar with. Shared-memory systems can be programmed using the OpenMP standard. Most C/C++ compilers provide an implementation of OpenMP. The GNU compiler, gcc, offers a good implementation. Distributed-memory systems can be programmed using MPI, the de-facto standard for this type of system. There are many implementations, but MPICH or OpenMPI are two open-source, well-maintained implementations. Finally, accelerators can be programmed using CUDA, the NVIDIA mechanism to program their GPUs. The nvcc compiler is free and extremely powerful in compiling CUDA code. An alternative to CUDA is OpenCL, a standard that targets general accelerator architectures.
  • High Performance Computing Resources. This is the biggest challenge. Sometimes, supercomputers are available at Latin American institutions. In that case, the system administrators may be willing to provide the instructor with an education allocation on the system. If no supercomputers are readily available, it is possible to get allocation on international supercomputers through agreements or access programs (check this post or this other post for more information). Finally, if nothing of the above is an option, then a simple desktop will provide the minimal testbed for at least shared-memory and distributed-memory programming. If the desktop is equipped with a GPU, then all the puzzle is completed. Some laboratories may actually feature multiple desktop computers with the aforementioned configuration.
Finally, an advice on keeping the class dynamic is to start as soon as possible with programming examples. A language that is simple (yet powerful) to introduce data and loop parallelism is Cilk Plus. Even compilers such as gcc and Clang now recognize Cilk constructs.






No comments:

Post a Comment