GridKa2018

Do you want to create a new course on this topic? Click here!

Introduction to Python

The course targets beginning Python developers and people familiar with scripting. The basics required to complete the course are covered, but ideally you already feel comfortable writing small scripts in any language. We highly recommend to use your own laptop (Linux, MacOS, Cygwin) for the exercises.

Introduction to Go 2018

In this workshop, we will introduce the basics of programming in Go and then work our way up to concurrency programming with this relatively new language.

We'll start with the usual "Hello World" program, introduce functions, variables, packages and then interfaces. Then, we will tackle the two main tools at the disposal of the Go programmer (colloquially known as a gopher): the channels and the goroutines. This will be done by implementing a small peer to peer application transmitting text messages over the network.

The workshop wraps up with a whirlwind tour of scientific and non-scientific libraries readily available, and prospects/news about the next Go version.

References

People will have to install the Go compiler on their laptop. The instructions to do so for their favorite operating system are detailed at:

To get a taste of what Go looks like and wet their feet, people can also follow the interactive, browser-based, installation-free tour from:

The material for this course is over there:

- https://github.com/sbinet/gridka-go-tuto

Behind the scenes perspective: into the abyss of profiling for performance

This workshop is an introduction to profiling a system or individual programs. We will concentrate on profiling to understand aspects of program performance. All discussion and examples are presented with Linux in mind, and some Intel specific information will be given.

Topics include:

Overview of

  • CPU microarchitecture (Intel mainly)
  • OS: processes and threads
  • Parallel programming models
  • Algorithm Complexity to Real Code expectations

Profiling & Performance: types and typical metrics

  • How to profile
  • Introduction to various tools

Hands on exercises using profiling tools

  • Some HEP specific cases and examples

It will be assumed the participants are already familiar with Linux. The application level portion of the profiling exercises will be done with C or C++ programs.

Please bring a laptop with an ssh client installed, to be able to connect to servers and run the exercises.

Collaborative Software Development 2018

Writing maintainable software is a prerequisite in many fields. Especially when working in projects with many members it is essential to

  • write readable software and documentation,
  • enable versioning of software,
  • ensure correctness of software,
  • enable automated tests of software, and
  • enable agile workflows based on issue tracking.

However, the goals of maintainable software are not only relevant when working in teams, but also in private projects. This makes the topic relevant for anybody that needs to write and maintain software. Based on experiences from projects in academia and industry, this tutorial introduces tools and concepts to enable maintainable software projects in collaborative environments. While we try to give a broad overview on different topics, we also flexibly provide in-depth information depending on your feedback during the course. We cover topics such as version control and organisation of software with git, concepts of unit testing and test-driven development, tools supporting continuous integration as well as the integration into wikis and ticket systems.

Throughout this tutorial you will learn how to efficiently integrate different tools and concepts to enable maintainable software. After the course, you will have a basic setup that can be adapted to your specific needs.

This course is a hands-on tutorial and requires basic knowledge in Python programming. For best learning experiences and an overview on encompassing software development processes, we suggest the combined participation in the workshop Introduction to Python and Collaborative Software Development.

Databases for large-scale science

In this workshop, the students will (a) learn how to efficiently use relational and non-relational databases for modern large-scale scientific experiments, and (b) how to create database workflows suitable for analytics and machine learning.

First, the focus of the workshop is to teach efficient, safe, and fault-tolerant principles when dealing with high-volume and high-throughput database scenarios. This includes, but is not limited to, systems such as PostgreSQL, Redis, or ElasticSearch. Topics include query planning and performance analysis, transactional safety, SQL injection, and competitive locking.

Second, we focus on how to actually prepare data from these databases to be usable for analytics and machine learning frameworks such as Keras.

An intermediate understanding of Python, SQL, and Linux shell scripting is recommended to follow this course. An understanding of machine learning principles is not required.

Docker Container Hands-On

Container technologies are rapidly becoming the preferred way to distribute, deploy, and run services by developers and system administrators. They provide the means to start a light-weight virtualization environment, i.e., a container, based on Linux kernel namespaces and control groups (cgroups). Such virtualization environment is cheap to create, manage, and destroy, requires a negligible amount of time to set-up, and provides performance equatable with the one of the host. Docker offers an intuitive way to manage containers by abstracting and automating the low-level configuration of namespaces and cgroups, ultimately enabling the development of an entire ecosystem of tools and products around containers.


This workshop covers aspects ranging from the basic concepts of Docker (e.g., set up of a Docker environment on your machine, run a container interactively, build-tag-publish images) to the deployment of complex service stacks using container clusters and orchestration software (e.g., Docker Compose and Kubernetes). The workshop will discuss in detail the concepts of network, volume, and resource management, demonstrating that containers are suitable for a variety of applications and their actual advantages over traditional virtual machines.

Note: The workshop includes hands-on exercises. To benefit to the maximum of the tutorial part, you should bring your own laptop and have Internet connection. You should also be comfortable working with the Linux terminal, editing files with common editors (e.g., vi, nano, emacs, etc.), and installing packages over the command line.

Hacking Hands-on

In this IT security workshop the participants will change ends and take the role of a hacker attacking servers and services within a prepared environment.

During the workshop we will play with different web applications waiting to be hacked. Many web apps have striking bugs that threaten the data of millions of users. You will learn about SQL injection, scripting issues, request forgery and more. We will also explore and use the Metasploit Framework, a tool that aids hackers at choosing and running exploits against one or many targets.

Every part of the workshop starts with a condensed introduction of the basics of the topic. After that, it's your turn! You have the opportunity to replay the demos and explore further techniques and possibilities of the exploit tools. Finally, you can attack and try to "pwn" servers with varying levels of difficulty in the lab environment. At the end of every unit we will discuss your findings and experiences together.

Requirements for participants[edit source]

You should be familiar with the Unix command line and the concept of manpages. A basic understanding of common web technologies and the ability to read scripting languages is helpful. Knowledge of TCP/IP and network services is also recommended.

Please have an SSH client (OpenSSH, PuTTY, MobaXterm, ...) ready on your laptop to connect to the lab environment. The operating system of your laptop does not matter. All necessary tools will be provided but you can of coure install additional software tools you want to play with.

Introduction to the SciPy stack and Jupyter Notebooks

Python provides a rich ecosystem of open-source software for mathematics, science, and engineering. This tutorial will introduce you to the fundamental packages of the SciPy stack. You will learn how-to: perform fast numerical calculations in N dimensions using NumPy, analyze your data using Pandas, and visualize the results using Matplotlib. The exercises will be performed in the Jupyter Notebook environment, which you can access through your web browser. You will need a tablet or a laptop and basic knowledge of the Python programming language.

Introduction to using HTCondor to run distributed compute Jobs and Workflows on Servers, Clusters, Grids, or Clouds

HTCondor (http://htcondor.org) is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks (jobs). It can be used to manage workloads on a single server, a cluster of computers, public cloud resources, or even national computing grids like the Open Science Grid (http://opensciencegrid.org/).


This workshop will introduce the concept of High Throughput Computing and how to submit large batches of jobs and also job workflows (job pipelines) to HTCondor, which will be of interest to end users. We will discuss the architecture of the system, and participants will create a unified compute cluster.


If you wish to participate in the hands-on exercises (not required, but recommended), you will need a laptop with WiFi, an SSH client (such as PuTTY if using Windows), familiarity with the Linux command-line environment (cd, less, cp, rm, mkdir, etc), and the ability to edit a file using a Linux terminal editor like vi, vim, or nano.

Julia: high performance programming the easy way

Besides offering fresh ideas and new programming concepts, Julia was mainly created to solve the two language problem. The two language problem describes the common pattern of prototyping algorithms in an easy to use high level language and then reimplementing it in a fast language like C - doubling development costs and making updates and further development more complicated. This also has led to a split in scientific computing: you work mainly in a scripting language, while all the performance critical libraries are unapproachable black boxes written in a more difficult language. For most users this is okay, but for developpers it makes growing the ecosystem more difficult and it's not as easy to engange users into contributing back to the core library.

Julia solves this with a sophisticated compiler model which manages to combine the usability of dynamic scripting languages with the performance of low level languages.

In this workshop I will introduce the basic mechanisms of how Julia works and will teach some fun programming examples showing how to use Julia's type system, meta programing and how to make any Julia program run as fast as highly optimized C - all while being at least as readable as python code!

I will also show some more advanced examples which will explain how Julia can offer completely new possibilities for library developpers, by having high performance libraries written in a dynamic language. One of those examples is how to seamlessly move your code to the GPU and do e.g. automatic differentiation on the GPU and CPU alike without loosing any performance.

Machine Learning with Neural Networks

Machine learning, and especially deep learning, is one of the current hot topics in computer science and engineering. It has not only experienced tremendous advancements in its theoretical foundations during the last few years, but is now also the state-of-the-art method in a broad range of applications. In this course, you will learn the

  • basic terms and approaches in machine learning,
  • understand the fundamental concepts of logistic regression and neural networks as well as
  • build your own first deep learning models.

Using small to mid-sized application use cases from science and computer vision you are going to experience how to put the gained knowledge into practice. As the machine learning framework of choice, we are going to use the TensorFlow library as computational back-end to the deep learning library Keras in the Python programming language (some prior knowledge is necessary). Using modern GPU computing resources in a cluster computing system, we are going to have a look at typical machine learning applications, such as classification problems and numerical regression analysis. Please make sure to bring your own laptop and refresh you basic knowledge on vectors and matrices. We are looking forward to having you!

Parallel programming with OpenMP and MPI

OpenMP (Open Multi-Processing) is a programming interface for shared memory parallelization on multiprocessor computers. The Message Passing Interface (MPI) is a comunication standard describing the exchange of messages for distributed memory parallelization on parallel computers. Both programming concepts will be introduced with simple examples. In this course, you will learn to write simple parallel programs using both interfaces.

Since this course is conducted on Linux systems, you should be able to use the command line and have some basic programming skills in C/C++.

Productive GPU Programming with OpenACC

OpenACC is a directive-based programming model for highly parallel systems, which allows for automated generation of portable GPU code. In this tutorial, we will get to know the programming model with examples, learn how to use the associated tools environment, and incorporate first strategies for performance optimization into our programs. Finally, we will integrate OpenACC with other GPU programming strategies.

Quantum Computing

This Quantum Computing tutorial will enable the participants to access and run calculations on real quantum computers from IBM. The course gives an introduction into the IBM Q Experience as well as to the open source Quantum Information and Science Kit (Qiskit), an open-source quantum computing framework for leveraging today's quantum processors and conducting research. Basic knowledge of Quantum Mechanics, Linear Algebra and Python is assumed (but not mandatory).

Scalable and reproducible workflows with Pachyderm

Data scientists must manage analyses that consist of multiple stages, large datasets and a great number of tools, all the while maintaining reproducibility of results. Amongst the variety of available tools to undertake parallel computations, Pachyderm is an open-source workflow-engine and distributed data processing tool that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem. In this workshop you will learn how to:

  • create a simple local Kubernetes infrastructure,
  • install and interact with Pachyderm and
  • implement a scalable and reproducible workflow using containers.

Instructions: https://github.com/jonandernovella/gridka-pachyderm Relevant paper: https://doi.org/10.1093/bioinformatics/bty699

Scalable Scientific Analysis in Python using Pandas and Dask

Pandas is a Python package that provides data structures to work with heterogenous, relational/tabular data. It provides fundamental building blocks for a powerful and flexible data analysis. Pandas provides functionality to load a wide set of data formats, manipulate the resulting data and also visualize it using various plotting frameworks. We will show in the workshop how to clean and reshape data in Pandas and use the concept of split-apply-combine to do exploratory analysis on it. Pandas provides powerful tooling to do data analysis on a single machine and is mostly mostly constrained to a single CPU. To parallelize and distribute these tasks, one can use Dask.


Dask is a flexible tool for parallelizing Python code on a single machine or across a cluster. We can think of dask at a high and a low level: Dask provides high-level Array, Bag, and DataFrame collections that mimic NumPy, lists, and Pandas but can operate in parallel on datasets that don't fit into main memory. Dask's high-level collections are alternatives to NumPy and Pandas for large datasets. In the low level, Dask provides dynamic task schedulers that execute task graphs in parallel. These execution engines power the high-level collections mentioned above but can also power custom, user-defined workloads. In the tutorial, we will cover the high-level use of dask.array and dask.dataframe.

Under the hood: Bare Metal Embedded Programming in C

A single plain C file is sufficient to express an embedded program.

As the Arm Cortex-M architecture is designed with C-Code in mind, no assembly level system bring up code is required. This workshop will teach you how to program C code on top of a bare metal CPU without an operating system or support libraries like libc.

It will give you insight on how linkers can be configured to run your program at the right location and placing data. We will use the free arm-gcc toolchain and related tools form the toolchain to analyze the program on assembly level to understand better how C-language is mapped to machine code depending on the chosen compiler optimization level and linker settings.

The workshop will further introduce you to how low level features like stacks and interrupts are used and how they map onto Arm Assembly code. One of the purposes of this course is to lay out the programming methods for talking to hardware in a minimal configuration. Our broader target is a better understanding of interaction with low level hardware and toolchains for embedded systems.

Last, but not least we will present debug techniques for low level / OS-development and might talk about security features of the used microcontroller platform.

In case you’re interested in reading material on the topic, we recommend “The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors 3rd Edition” – but it will be by now means required for participating in this course.

An internet connected laptop is required for participating in this workshop - please install the latest version of Docker on your system and verify its running and that your system is updated. We will provide a Docker-based Linux environment for you with a pre-installed arm-gcc toolchain.

Basic knowledge of the C-programming language is required.