Debian Cloud Images

Cloud computing is gaining momentum. Debian has own team, Debian Cloud Team, created during the last DebConf in Switzerland, with Alioth page and mailing list. Team’s description is “We work to ensure that Debian, the Universal operating system, works well in public, private, and hybrid clouds.”

To ensure proper usage of Debian on the cloud we need to solve two main issues: we need to be able to create system image used by virtual machine, and we need to be able to configure virtual machine when it is started.

cloud-init

Every system needs configuration; it is usually done during installation and after first system run. During configuration we generate user accounts and passwords and decide which packages to install. After starting system we configure installed programs, deploy data (e.g. for web server) and so on.

Similar needs exist for cloud running machines. However, we do not install systems on virtual machines individually but use one of the available images. We also do not have console access to systems running in the cloud. We need to put SSH keys on the machine to ensure that we have SSH access to that machine. Also, as cloud usage usually means that we are running machines in large quantities, we should configure machine and its programs automatically. It is useful when machines are started and stopped without our intervention, e.g. for auto scaling, or when running machine as recovery after other one crashed.

There is also configuration related to the cloud. We might want to configure set of repositories used to install or update packages. It might be good idea to use specific repositories, e.g. provided by cloud providers, so we do not reach outside (thus avoiding costs related to network transfer) and maybe so we use packages provided by cloud provider, e.g. with kernels and drivers specific to hardware or virtualization solution used to run images on.

Most distributions use cloud-init, set of scripts written in Python, to configure virtual machine when it starts. It can be used on different cloud providers and can be used with different Linux distributions. It allows for providing user-data to deploy to images, and to provide script which will be run during startup.

cloud-init is developed on Launchpad and source code is kept in Bazaar repository. It has good documentation. All files are kept in /var/lib/cloud; configuration is kept as YAML files.

Debian contains cloud-init package. Because of some problems with packaging cloud-init was not included in Wheezy (latest stable release) but it is available in Wheezy backports.

Image creation

Everyone can create own custom images to run on the cloud. Amazon provides documentation how to create AMI (image files) to use for running on EC2:

There is Debian wiki page describing how to create AMI, based on the official documentation. The process of creating AMI manually is rather complicated, involving many steps; wiki page warns that this is work in progress and is unfinished.

Debian wiki also contains page describing creation of Debian Installer images on AWS. There is no script for automatic creation of those images yet. Page contains steps one needs to follow to create images, in manner similar to describe in paragraph above. Creation of such images require cloud-init.

Instead of creating images ourselves we can use images created by others – there is market for available images. Companies and organizations can provide images created by them. Such images are more trusted by users as they have been created and configured by organizations responsible for software contained om such an image.

There is set of official Debian cloud images, just like there is set of official CD and DVD images. The most advanced situation is for Amazon Web Services AWS EC2; James Bromberger was delegated by the Debian Project Leader to manage Debian images on AWS Marketplace. He also maintains list of current images and manages AWS account which serves as the owner of provided images.

build-debian-cloud

Creating images manually is long task and it is better to have scripts to create images. Extending script to allow for configuration of created images allows for experimentation or providing different images. This is the role of ”build-debian-cloud”, script written in Python, intended to build Debian image to different cloud IaaS providers. Currently AwS and VirtualBox are supported as providers of cloud solutions.

The build-debian-cloud source is hosted on GitHub and is currently developed by Anders Ingemann, which currently work on WIP-python branch on cloud-init and HVM support. It is forked from repository started by http://www.camptocamp.com/ but this original repository is inactive since April 2013.

Script uses JSON manifest file for configuring built images and it uses JSON schema for validation of provided configuration manifest. It requires:

  • debootstrap
  • parted
  • grub2
  • jsonschema
  • qemu-utils
  • euca2ools – set of scripts to access AWS; I’m not sure whether aws-cli can be used instead
  • boto – Python module to access AWS

It is task-based system with tasks organized into modules which eases configuration of created images. It logs many aspects of image creation process to help with solving problems and to provide feedback on image building It also provides rollback to recover from problems during image creation.

Repository contains following files and directories, described in sections below:

  • base – Python code managing building of images.
  • build-debian-cloud – Simple script, calling main() from ”base”
  • common – Python code used by base, plugins, providers.
  • CONTRIBUTING.md – Tips for extending build-debian-cloud. Source is not fully following PEP8, for example it uses tabs and spaces and allows for 110 columns. One can use pep8 with following options disabled to check source code: E101, E211, E241, E501, W191.
  • logs – Directory for logs generated during image creation.
  • manifests – Files with manifests for building different cloud images.
  • plugins – Directory with various plugins which can enable functionality in built images.
  • providers – Directory with modules for building images for different cloud providers.
  • README.md

Script is still work in progress. For example it currently can only build PVM-based AMIs, now HVM – so images built by it cannot be used to run GPU instances.

base

It contains basis of functionality which then uses information from other directories. It exports:

  • Mainfest from manifest.py
  • Phase from phase.py
  • Task from task,py
  • main from main.py

log.py defines logging functionality, including classes ConsoleFormatter and FileFormatter.

main.py defines functions used by build-debian-cloud script. main() parses arguments (calling get_args()), setups logging, and calls run()run() loads manifest (class Manifest) and prepares list of tasks (class TaskList) to use according to manifest, using list of available task, available plugins and providers. Then it creates BoostrapInformation object using manifest. Then it calls tasklist.run() to execute all tasks in appropriate order. In case of exception it rolls back changes using task list.

manifest-schema.json contains JSON schema, which is used by code in manifest.py to check validity of manifest used to build image. Manifest contains following sections:

  • provider
  • bootstapper
    • mirror
    • workspace
    • tarball
  • system
    • release – only wheezy for now
    • architecture
    • bootloader
    • timezone
    • locale
    • charmap
  • packages
    • mirror
    • sources
    • remote
    • local
  • volume
  • plugins

manifest.py defines class Manifest, used to manage manifest describing image. It loads (load()), validates (validate()), and parses (parse()) JSON file. load() minifies JSON using function from minify_json.py, and loads all providers and plugins used in manifest. validate() validates manifest according to main schema, and to schemas defined for modules and providers – which can alter JSON schema. parse() exposes JSON as object attributes:

  • provider
  • bootstrapper
  • image
  • volume
  • system
  • packages
  • plugins

task.py defines abstract class Task, which will be used to implement tasks performed during creating images. Child classes must implement the method run(). Each Task contains phase (class Phase, defined in phase.py, used for ordering) and list of predecessors and successors.

tasklist.py defines class TaskList, used to order Tasks. All Tasks are kept in set, created in load(). Method run() calls create_list() to create the list of tasks to run, and then runs each task and adds it to list tasks_completedcreate_list() uses check_ordering() to check validity of phases of predecessors and successors, then it checks for cycles in graph of dependencies by finding strongly_connected_components(), and then calls topological_sort() to order tasks so they can be run.

Directory pkg contains definition of classes responsible for managing packages. exceptions.py defines two exception classes: PackageError and SourceError. sourcelist.py defines two classes: Source describing one source package (deb-src), and SourceList describing set of source packages. packagelist.py defines PackageList, managing list of binary packages.

Partitions and Volumes

Directory fs contains definitions of classes used to manage partitions in created images. exceptions.py defines two exceptions: VolumeError and PartitionError. volume.py defines class Volume, used as base class by all other classes. Volume contains methods _after_create() and _check_blocking() which are used in appropriate moments when creating image. It defines set of events it can respond to:

  • create, changing state from nonexistent to detached
  • attach, changing state from detached to attached
  • detach, changing state from attached to detached
  • delete, changing state from detached to deleted

Directory partitions contains definitions of classes used to manage partitions. abstract.py defines AbstractPartition class. It contains methods related to events from partition map lifetime: _before_format(), mount() and _before_mount(), _after_mount(), _before_unmount(), add_mount(), remove_mount(), get_uuid(). It also defines events it can respond to:

  • create, changing state from nonexistent to created
  • format, changing state from created to formatted
  • mount, changing state from formatted to mounted
  • unmount, changing state from mounted to formatted

base.py defines class BasePartition which adds more event-related methods: create()get_index()get_start()map()_before_map()_before_unmap(). Those are needed to manage partitions in partition maps. It also changes available states and events:

  • create, changing state from nonexistent to unmapped
  • map, changing state from unmapped to mapped
  • format, changing state from mapped to formatted
  • mount, changing state from formatted to mounted
  • unmount, changing state from mounted’ to formatted
  • unmap, changing state from formatted to unmapped_fmt
  • map, changing state from unmapped_fmt to formatted
  • unmap, changing state from mapped to unmapped

Other files define concrete partitions:

  • gpt.py defines GPTPartition inheriting from BasePartition
  • gpt_swap.py defines GPTSwapPartition inheriting from GPTPartition
  • mbr.py defines MBRParition inheriting from BasePartition
  • mbr_swap.py defines MBRSwapParittion inheriting from MBRPartition
  • single.py defines SinglePartition inheriting from AbstractPartition.

Directory partitionmaps contains definitions of classes used to manage disk volumes and their relations to partitions. abstract.py defines class AbstractPartitionMap, used as base for all other classes from this directory. It contains methods related to events from partition map lifetime: create() and  _before_create()map() and _before_map()unmap() and _before_unmap(), and is_blocking() and get_total_size(). It defines set of events it can respond to, just like Volume class:

  • create, changing state from nonexistent to unmapped
  • map, changing state from unmapped to mapped
  • unmap, changing state from mapped to unmapped

Other files contain definitions of concrete classes:

  • none.py defines NoPartitions
  • gpt.py defines GPTPartitionMap
  • mbr.py defines MBRParitionMap

common

This directory contains code used by all other classes – main script, plugins, and providers. exceptions.py defines three exceptions: ManifestError, TaskListError, and TaskError.

fsm_proxy.py defines FSMProxy class, being base class for all volume- and partition-related classes. It contains methods responsible for event listeners and proxy methods. phases.py creates Phase objects and puts them in order array:

  1. preparation
  2. volume_creation
  3. volume_preparation
  4. volume_mounting
  5. os_installation
  6. package_installation
  7. system_modification
  8. system_cleaning
  9. volume_unmounting
  10. image_registration
  11. cleaning

task_sets.py contains definition of available tasks, imported from common.tasks. All tasks are grouped into arrays holding related tasks:

  • base_set
  • volume_set
  • partitioning_set
  • boot_partition_set
  • mounting_set
  • ssh_set
  • apt_set
  • locale_set
  • bootloader_set

tools.py contains functions related to logging.

Directory assets contains assets used during image creation. Currently it contains init.d directory.

Directory fs contains file-system-related classes; all such classes inherit from Volume defined in base.fs.volume.

Directory tasks contains definition of classes describing tasks, inheriting from base.task.Task. There is too many classes to describe, so I only list files:

  • apt.py
  • boot.py
  • bootstrap.py
  • cleanup.py
  • development.py
  • filesystem.py
  • host.py
  • initd.py
  • locale.py
  • loopback.py
  • network.py
  • packages.py
  • partitioning.py
  • security.py
  • volume.py
  • workspace.py

plugins

Contains directories with plugins. Each directory is Python module. Each module contains __init__.py and tasks.py files; it might contain manifest-schema.json, README.md, and assets directory. File tasks.py contains definition of task classes provided by the plugin, inheriting from base.task.Task.

Available modules:

  • admin_user
  • build_metadata
  • cloud_init
  • image_commands
  • minimize_size
  • opennebula
  • prebootstrapped
  • root_password
  • unattended_upgrades
  • vagrant

providers

Directory contains available cloud providers – targets of generated images. Currently there are two providers:

  • ec2
  • virtualbox

Each directory is Python module, containing __init__.py, manifest.py, and manifest-schema.json. It might also contain assets and tasks directories to define new tasks to be used when building image.

manifests

Directory contains manifests which can be used to create Debian images. Those manifests can also serve as examples which can be customized. Currently it contains:

  • ec2-ebs-debian-official-amd64-hvm.manifest.json
  • ec2-ebs-debian-official-amd64-pvm.manifest.json
  • ec2-ebs-debian-official-i386-pvm.manifest.json
  • ec2-ebs-partitioned.manifest.json
  • ec2-ebs-single.manifest.json
  • ec2-s3.manifest.json
  • virtualbox.manifest.json
  • virtualbox-vagrant.manifest.json

Summary

Providing official Debian images for the cloud is as important as providing ISO images. Having scripts helping with this tasks means that we can do it easier and use saved time for other tasks, more developing and less housekeeping. If you are interested in the cloud join Debian Cloud Team.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s