Cloud scripts

I’ve just created new repository on GitHub: https://github.com/rybaktomasz/nuage It contains Python scripts intended to cooperate with bootstrap-vz, the code for building Debian images for various cloud providers. For now those are using Boto, and are only working with Amazon Web Services. There are also scripts simplifying a bit working with AWS, like running machines on EC2 and storing files on S3. They now support Python 2; as soon as Debian contains Boto with support for Python 3, I’ll move them to Python 3. I do not have far-reaching plans for this repository – I intend for those to be just my personal scripts. Do not expect frequent commits here. If you find those useful – good for you. If not – sorry, but those are just for me, not for everyone.

PyOpenCL in main!

Yesterday (2014-05-26) my sponsor Piotr Ożarowski uploaded new version of PyOpenCL to Debian. Usually I can upload new versions of packages to Debian as I am Debian Maintainer. But this time it was very special upload. It was closing bug 723132, asking to move PyOpenCL from contrib to main. Because Debian contains free OpenCL implementations, Beignet and Mesa, one can run OpenCL programs using FLOSS code.

Moving package from contrib to main meant that PyOpenCL had to be removed from contrib, and uploaded anew to main. Thanks to Piotr for sponsoring it, and to FTP masters for accepting in from NEW, and dealing with all this removal/adding of package.

There is still work to do. Rebecca Palmer works on allowing for having all OpenCL implementations installed, which should lead for more experimentation and easier work with OpenCL but requires changes to many of the OpenCL-related packages. I’m also thinking about moving PyOpenCL to use pybuild, but this needs to wait till I have more free time.

Let’s hope that having PyOpenCL in main will allow for more people to find and use it.

Debian Cloud Images

Cloud computing is gaining momentum. Debian has own team, Debian Cloud Team, created during the last DebConf in Switzerland, with Alioth page and mailing list. Team’s description is “We work to ensure that Debian, the Universal operating system, works well in public, private, and hybrid clouds.”

To ensure proper usage of Debian on the cloud we need to solve two main issues: we need to be able to create system image used by virtual machine, and we need to be able to configure virtual machine when it is started.

cloud-init

Every system needs configuration; it is usually done during installation and after first system run. During configuration we generate user accounts and passwords and decide which packages to install. After starting system we configure installed programs, deploy data (e.g. for web server) and so on.

Similar needs exist for cloud running machines. However, we do not install systems on virtual machines individually but use one of the available images. We also do not have console access to systems running in the cloud. We need to put SSH keys on the machine to ensure that we have SSH access to that machine. Also, as cloud usage usually means that we are running machines in large quantities, we should configure machine and its programs automatically. It is useful when machines are started and stopped without our intervention, e.g. for auto scaling, or when running machine as recovery after other one crashed.

There is also configuration related to the cloud. We might want to configure set of repositories used to install or update packages. It might be good idea to use specific repositories, e.g. provided by cloud providers, so we do not reach outside (thus avoiding costs related to network transfer) and maybe so we use packages provided by cloud provider, e.g. with kernels and drivers specific to hardware or virtualization solution used to run images on.

Most distributions use cloud-init, set of scripts written in Python, to configure virtual machine when it starts. It can be used on different cloud providers and can be used with different Linux distributions. It allows for providing user-data to deploy to images, and to provide script which will be run during startup.

cloud-init is developed on Launchpad and source code is kept in Bazaar repository. It has good documentation. All files are kept in /var/lib/cloud; configuration is kept as YAML files.

Debian contains cloud-init package. Because of some problems with packaging cloud-init was not included in Wheezy (latest stable release) but it is available in Wheezy backports.

Image creation

Everyone can create own custom images to run on the cloud. Amazon provides documentation how to create AMI (image files) to use for running on EC2:

There is Debian wiki page describing how to create AMI, based on the official documentation. The process of creating AMI manually is rather complicated, involving many steps; wiki page warns that this is work in progress and is unfinished.

Debian wiki also contains page describing creation of Debian Installer images on AWS. There is no script for automatic creation of those images yet. Page contains steps one needs to follow to create images, in manner similar to describe in paragraph above. Creation of such images require cloud-init.

Instead of creating images ourselves we can use images created by others – there is market for available images. Companies and organizations can provide images created by them. Such images are more trusted by users as they have been created and configured by organizations responsible for software contained om such an image.

There is set of official Debian cloud images, just like there is set of official CD and DVD images. The most advanced situation is for Amazon Web Services AWS EC2; James Bromberger was delegated by the Debian Project Leader to manage Debian images on AWS Marketplace. He also maintains list of current images and manages AWS account which serves as the owner of provided images.

build-debian-cloud

Creating images manually is long task and it is better to have scripts to create images. Extending script to allow for configuration of created images allows for experimentation or providing different images. This is the role of ”build-debian-cloud”, script written in Python, intended to build Debian image to different cloud IaaS providers. Currently AwS and VirtualBox are supported as providers of cloud solutions.

The build-debian-cloud source is hosted on GitHub and is currently developed by Anders Ingemann, which currently work on WIP-python branch on cloud-init and HVM support. It is forked from repository started by http://www.camptocamp.com/ but this original repository is inactive since April 2013.

Script uses JSON manifest file for configuring built images and it uses JSON schema for validation of provided configuration manifest. It requires:

  • debootstrap
  • parted
  • grub2
  • jsonschema
  • qemu-utils
  • euca2ools – set of scripts to access AWS; I’m not sure whether aws-cli can be used instead
  • boto – Python module to access AWS

It is task-based system with tasks organized into modules which eases configuration of created images. It logs many aspects of image creation process to help with solving problems and to provide feedback on image building It also provides rollback to recover from problems during image creation.

Repository contains following files and directories, described in sections below:

  • base – Python code managing building of images.
  • build-debian-cloud – Simple script, calling main() from ”base”
  • common – Python code used by base, plugins, providers.
  • CONTRIBUTING.md – Tips for extending build-debian-cloud. Source is not fully following PEP8, for example it uses tabs and spaces and allows for 110 columns. One can use pep8 with following options disabled to check source code: E101, E211, E241, E501, W191.
  • logs – Directory for logs generated during image creation.
  • manifests – Files with manifests for building different cloud images.
  • plugins – Directory with various plugins which can enable functionality in built images.
  • providers – Directory with modules for building images for different cloud providers.
  • README.md

Script is still work in progress. For example it currently can only build PVM-based AMIs, now HVM – so images built by it cannot be used to run GPU instances.

base

It contains basis of functionality which then uses information from other directories. It exports:

  • Mainfest from manifest.py
  • Phase from phase.py
  • Task from task,py
  • main from main.py

log.py defines logging functionality, including classes ConsoleFormatter and FileFormatter.

main.py defines functions used by build-debian-cloud script. main() parses arguments (calling get_args()), setups logging, and calls run()run() loads manifest (class Manifest) and prepares list of tasks (class TaskList) to use according to manifest, using list of available task, available plugins and providers. Then it creates BoostrapInformation object using manifest. Then it calls tasklist.run() to execute all tasks in appropriate order. In case of exception it rolls back changes using task list.

manifest-schema.json contains JSON schema, which is used by code in manifest.py to check validity of manifest used to build image. Manifest contains following sections:

  • provider
  • bootstapper
    • mirror
    • workspace
    • tarball
  • system
    • release – only wheezy for now
    • architecture
    • bootloader
    • timezone
    • locale
    • charmap
  • packages
    • mirror
    • sources
    • remote
    • local
  • volume
  • plugins

manifest.py defines class Manifest, used to manage manifest describing image. It loads (load()), validates (validate()), and parses (parse()) JSON file. load() minifies JSON using function from minify_json.py, and loads all providers and plugins used in manifest. validate() validates manifest according to main schema, and to schemas defined for modules and providers – which can alter JSON schema. parse() exposes JSON as object attributes:

  • provider
  • bootstrapper
  • image
  • volume
  • system
  • packages
  • plugins

task.py defines abstract class Task, which will be used to implement tasks performed during creating images. Child classes must implement the method run(). Each Task contains phase (class Phase, defined in phase.py, used for ordering) and list of predecessors and successors.

tasklist.py defines class TaskList, used to order Tasks. All Tasks are kept in set, created in load(). Method run() calls create_list() to create the list of tasks to run, and then runs each task and adds it to list tasks_completedcreate_list() uses check_ordering() to check validity of phases of predecessors and successors, then it checks for cycles in graph of dependencies by finding strongly_connected_components(), and then calls topological_sort() to order tasks so they can be run.

Directory pkg contains definition of classes responsible for managing packages. exceptions.py defines two exception classes: PackageError and SourceError. sourcelist.py defines two classes: Source describing one source package (deb-src), and SourceList describing set of source packages. packagelist.py defines PackageList, managing list of binary packages.

Partitions and Volumes

Directory fs contains definitions of classes used to manage partitions in created images. exceptions.py defines two exceptions: VolumeError and PartitionError. volume.py defines class Volume, used as base class by all other classes. Volume contains methods _after_create() and _check_blocking() which are used in appropriate moments when creating image. It defines set of events it can respond to:

  • create, changing state from nonexistent to detached
  • attach, changing state from detached to attached
  • detach, changing state from attached to detached
  • delete, changing state from detached to deleted

Directory partitions contains definitions of classes used to manage partitions. abstract.py defines AbstractPartition class. It contains methods related to events from partition map lifetime: _before_format(), mount() and _before_mount(), _after_mount(), _before_unmount(), add_mount(), remove_mount(), get_uuid(). It also defines events it can respond to:

  • create, changing state from nonexistent to created
  • format, changing state from created to formatted
  • mount, changing state from formatted to mounted
  • unmount, changing state from mounted to formatted

base.py defines class BasePartition which adds more event-related methods: create()get_index()get_start()map()_before_map()_before_unmap(). Those are needed to manage partitions in partition maps. It also changes available states and events:

  • create, changing state from nonexistent to unmapped
  • map, changing state from unmapped to mapped
  • format, changing state from mapped to formatted
  • mount, changing state from formatted to mounted
  • unmount, changing state from mounted’ to formatted
  • unmap, changing state from formatted to unmapped_fmt
  • map, changing state from unmapped_fmt to formatted
  • unmap, changing state from mapped to unmapped

Other files define concrete partitions:

  • gpt.py defines GPTPartition inheriting from BasePartition
  • gpt_swap.py defines GPTSwapPartition inheriting from GPTPartition
  • mbr.py defines MBRParition inheriting from BasePartition
  • mbr_swap.py defines MBRSwapParittion inheriting from MBRPartition
  • single.py defines SinglePartition inheriting from AbstractPartition.

Directory partitionmaps contains definitions of classes used to manage disk volumes and their relations to partitions. abstract.py defines class AbstractPartitionMap, used as base for all other classes from this directory. It contains methods related to events from partition map lifetime: create() and  _before_create()map() and _before_map()unmap() and _before_unmap(), and is_blocking() and get_total_size(). It defines set of events it can respond to, just like Volume class:

  • create, changing state from nonexistent to unmapped
  • map, changing state from unmapped to mapped
  • unmap, changing state from mapped to unmapped

Other files contain definitions of concrete classes:

  • none.py defines NoPartitions
  • gpt.py defines GPTPartitionMap
  • mbr.py defines MBRParitionMap

common

This directory contains code used by all other classes – main script, plugins, and providers. exceptions.py defines three exceptions: ManifestError, TaskListError, and TaskError.

fsm_proxy.py defines FSMProxy class, being base class for all volume- and partition-related classes. It contains methods responsible for event listeners and proxy methods. phases.py creates Phase objects and puts them in order array:

  1. preparation
  2. volume_creation
  3. volume_preparation
  4. volume_mounting
  5. os_installation
  6. package_installation
  7. system_modification
  8. system_cleaning
  9. volume_unmounting
  10. image_registration
  11. cleaning

task_sets.py contains definition of available tasks, imported from common.tasks. All tasks are grouped into arrays holding related tasks:

  • base_set
  • volume_set
  • partitioning_set
  • boot_partition_set
  • mounting_set
  • ssh_set
  • apt_set
  • locale_set
  • bootloader_set

tools.py contains functions related to logging.

Directory assets contains assets used during image creation. Currently it contains init.d directory.

Directory fs contains file-system-related classes; all such classes inherit from Volume defined in base.fs.volume.

Directory tasks contains definition of classes describing tasks, inheriting from base.task.Task. There is too many classes to describe, so I only list files:

  • apt.py
  • boot.py
  • bootstrap.py
  • cleanup.py
  • development.py
  • filesystem.py
  • host.py
  • initd.py
  • locale.py
  • loopback.py
  • network.py
  • packages.py
  • partitioning.py
  • security.py
  • volume.py
  • workspace.py

plugins

Contains directories with plugins. Each directory is Python module. Each module contains __init__.py and tasks.py files; it might contain manifest-schema.json, README.md, and assets directory. File tasks.py contains definition of task classes provided by the plugin, inheriting from base.task.Task.

Available modules:

  • admin_user
  • build_metadata
  • cloud_init
  • image_commands
  • minimize_size
  • opennebula
  • prebootstrapped
  • root_password
  • unattended_upgrades
  • vagrant

providers

Directory contains available cloud providers – targets of generated images. Currently there are two providers:

  • ec2
  • virtualbox

Each directory is Python module, containing __init__.py, manifest.py, and manifest-schema.json. It might also contain assets and tasks directories to define new tasks to be used when building image.

manifests

Directory contains manifests which can be used to create Debian images. Those manifests can also serve as examples which can be customized. Currently it contains:

  • ec2-ebs-debian-official-amd64-hvm.manifest.json
  • ec2-ebs-debian-official-amd64-pvm.manifest.json
  • ec2-ebs-debian-official-i386-pvm.manifest.json
  • ec2-ebs-partitioned.manifest.json
  • ec2-ebs-single.manifest.json
  • ec2-s3.manifest.json
  • virtualbox.manifest.json
  • virtualbox-vagrant.manifest.json

Summary

Providing official Debian images for the cloud is as important as providing ISO images. Having scripts helping with this tasks means that we can do it easier and use saved time for other tasks, more developing and less housekeeping. If you are interested in the cloud join Debian Cloud Team.

Using cloud to rebuild Debian

Making sure that all packages are of necessary quality is hard work. That’s why there is freeze before releasing new Debian to make sure that there are no known release critical bugs.

There is “Collab QA” team whose role is “sharing results from QA tests (archive rebuilds, piuparts runs, and other static checks”. It is also present on Alioth where source code for various tools is hosted.

As noted in Collab QA description one of the teams responsibilities is rebuilding packages in Debian archive. Large rebuilds are needed to test new version of compiler (e.g. during transition from GCC 4.7 to 4.8) or when considering building package using LLVM-based compilers. Most of packages are built during uploading, but some QA and tests require rebuilding large parts of archive.

It requires large computational power.  Thanks to the Amazon support, Debian can access EC2 to run some tasks, as noted by Lucas Nussbaum in his “bits from DPL – November 2013”.

Lucas Nussbaum (current Debian Project Leader) has wrote some scripts to rebuild and test packages on Amazon EC2. Scripts can be downloaded from git repository or cloned using git protocol. Scripts started as tool for rebuilding entire archive, and now they are used to test different versions of compilers (different gcc versions) and compiling of packages using clang.  Currently Lucas is not actively developing the code; David Suarez and Sylvestre Ledru took that role.

Scripts are written in Ruby. I am not proficient in Ruby so please forgive any mistakes and misunderstanding.

Full rebuild

Rebuilding is managed using one master node which always runs. While master nodes controls slave nodes it is not responsible for starting and stopping them.  User is responsible for starting slave nodes (usually from own machine), and sending list of them to the master node.  Default setup, described in README, uses 50 m1.medium nodes and 10 m2.xlarge nodes.  Smaller nodes are used to compile small packages.  Larger nodes are used to compile huge packages, needing much memory, like LibreOffice, x.org, etc.

Each slave node has one or more slots to deal with tasks; it means that it is possible to run tests in parallel, e.g. compile more than one package at the same time.

User is supposed to use AWS-CLI tools or other means to manage slave nodes. AWS-CLI is not yet part of Debian, although Taniguchi Takagi wants to package and upload them to Debian (Bug #733211). For the time being you can download AWS CLI source code from GitHub.

Spot instances are used to save on the costs.  This is possible because compiling packages (especially for tests, not as the step during uploading package to Debian archive) is not time critical.  It also is idempotent (we can compile package as many times as we want to) and it deals well with being stopped when spot instance is not available anymore.

All data is sent between nodes encoded using JSON.  Using JSON allows for sending arrays and dictionaries, which means that is it easy to send the structure describing package, rebuild options, log, parameters, result, etc.

There is no communication between user machine and master node.  User is supposed to SSH to master node, clone repository with scripts, and run scripts from inside of this repository.  Master node communicates with slave nodes using SSH and SCP; it sends necessary scripts to slave nodes and then runs them.

Usual workflow is described in README:

  1. Request spot instances
  2. Wait for their start
  3. Connect to master node
  4. Prepare job description (list all packages to test)
  5. Run master script passing list of packages and list of nodes as arguments
  6. Wait for all tasks to finish
  7. Download result logs
  8. Stop all slave instances

JSON contains information about packages to compile. Each package is
described using following fields:

  • type – Whether to test package compilation or installation (instest).
  • package – Name of package to test.
  • dist – Debian distribution to test on.
  • esttime – Estimated time for performing test, used for building
  • logfile – Name of file to write log to.

Repository contains many scripts, their names usually convey their jobs.
Scripts containing instest in their names are intended to test
installation or upgrade of packages.

  • clean – removes all logs and JSON files describing tasks and slave nodes
  • create-instest-chroots – creates chroot, debootstrap it, copies basic configuration, updates system, copies maintscripts; works with sid, squeeze, wheezy
  • genereate-tasks-* – Scripts for generating JSON files describing tasks for master to distribute to slave nodes.
  • genereate-tasks-instest – Read all packages from local repository and set them to test installation.
  • genereate-tasks-rebuild – Read list of packages from Ultimate Debian Database, excluding some, create list. Allows for limiting packages based on their build time. Uses unstable chroot.
  • genereate-tasks-rebuild-jessie – Script for build Jessie packages, using Jesse chroot.
  • genereate-tasks-rebuild-wheezy – Script for build Wheezy packages, using Wheezy chroot.
  • gzip-logs
  • instest – Testing installation.
  • masternode – Script run on master node, sending all tasks to slaves.
  • merge-tasks – Merges JSON with description of tasks.
  • process-task – Main script run on slave node.
  • setup-ganglia  – Installs Ganglia monitor on slave node, to monitoring it health.
  • update – Updates chroot to newest versions.

masternode

It accepts files containing list of packages to test and list of slave nodes as command line arguments.

It connects to each slave node and uploads necessary scripts (instest, process-task) to them.

For each node it creates as many threads as there is slots; each thread opens one SSH and one SCP connection.  Then each thread gets one task from task queue, and calls execute_one_task to process this task. Success is logged if task succeeds.  Otherwise task is added to retry queue. If there is no tasks left in main queue, number of available slots on the slave node is decreased, and thread (except for the last one) ends.

The last thread for each node is responsible for dealing with failed tasks from retry queue.  Again it loops for all available tasks, this time from retry queue, and calls execute_one_task for each of them. This time each task is run alone on the node, so problems caused by concurrent compilation (e.g. compiling PyCUDA and PyOpenCL with hardening options on machine with less than 4GB is problematic) should be solved. If task fails again it is not retried but only logged.

Script creates one additional thread which periodically (every minute) checks whether there are any tasks left in main and retry queues.

Script ends when all threads finish.

execute_one_task is simple function. It encodes task description into JSON and uploads JSON to slave node. Then it executes process_task on slave node and downloads log. It can also download built package from slave node and upload it to archive using reprepro script. Function returns whether test succeeded or not.

process-task

It is script run on slave node for each task.  It reads JSON file with description of task passed as command line argument. If master node wants to test installation, it runs instest and exits. Otherwise it proceeds with testing package build.

Script can accept options governing package build process. For example it sets DEB_BUILDOPTIONS=parallel=10 when we want to test parallel build. It can also accept versions of compilers and libraries to use during compilation. Script sets repositories and package priorities to ensure that proper versions of build dependencies are used. Then script calls sbuild to build package, and checks whether estimation of time needed to perform test was correct.

instest

It is used to test installation and update of package.  It uses chroot to install package to.

Accepts chroot location and package to test as command line arguments. It cleans chroots and checks whether package is already installed. Script tests installation in various circumstances: it installs only dependencies or build dependencies, installs package, installs package and all packages recommended by it, installs package and all packages recommended and suggested by it. Script can also test upgrading package, to check whether there are some problems caused by upgrade.

There are some workarounds for MySQL and PostgreSQL; it looks like there are some problems with post-inst scripts (which try to connect to newly installed database) in those packages, so testing must take such failure into consideration.

Summary

Using cloud helps with running many tests in short time. Such tests can serve as QA tool and help for experimentation.  Building packages in controlled environment, one which can easily be recreated and shut down allows for ensuring that packages are of good quality.  At the same time ability to run many tests, and preparing different environments can help with experimentation, e.g. testing different compilers, configuration options, and so on.

Thanks for Amazon and to James Bromberger to providing grants allowing for Debian to use AWS and EC2 to perform such tests.

PyOpenCL, PyCUDA, and Python 3

In the previous week the new versions of Debian packages for PyOpenCL and PyCUDA reached Debian unstable. The support for Python 3 is the largest change in the new PyCUDA.

Both PyOpenCL and PyCUDA support Python 3 now and contain compiled modules for Python 3.2 and 3.3. Both provide *-dbg packages for easier debugging. Because of addition of debug packages and Python 3 for PyCUDA all packages had to go through the Debian NEW queue.

Uploaded packages do not block Python 3.3 transition because they are built against and provide Python 3.3 support. They will need to be rebuilded during Boost 1.53 transition – and I hope to upload new versions at the same time.