I’ve just created new repository on GitHub: https://github.com/rybaktomasz/nuage It contains Python scripts intended to cooperate with bootstrap-vz, the code for building Debian images for various cloud providers. For now those are using Boto, and are only working with Amazon Web Services. There are also scripts simplifying a bit working with AWS, like running machines on EC2 and storing files on S3. They now support Python 2; as soon as Debian contains Boto with support for Python 3, I’ll move them to Python 3. I do not have far-reaching plans for this repository – I intend for those to be just my personal scripts. Do not expect frequent commits here. If you find those useful – good for you. If not – sorry, but those are just for me, not for everyone.
Yesterday (2014-05-26) my sponsor Piotr Ożarowski uploaded new version of PyOpenCL to Debian. Usually I can upload new versions of packages to Debian as I am Debian Maintainer. But this time it was very special upload. It was closing bug 723132, asking to move PyOpenCL from contrib to main. Because Debian contains free OpenCL implementations, Beignet and Mesa, one can run OpenCL programs using FLOSS code.
Moving package from contrib to main meant that PyOpenCL had to be removed from contrib, and uploaded anew to main. Thanks to Piotr for sponsoring it, and to FTP masters for accepting in from NEW, and dealing with all this removal/adding of package.
There is still work to do. Rebecca Palmer works on allowing for having all OpenCL implementations installed, which should lead for more experimentation and easier work with OpenCL but requires changes to many of the OpenCL-related packages. I’m also thinking about moving PyOpenCL to use pybuild, but this needs to wait till I have more free time.
Let’s hope that having PyOpenCL in main will allow for more people to find and use it.
I am member of Association for Computing Machinery. It’s organization providing (among other) access to magazines, news, books, and courses. Access is provided by web pages and through PDFs one can download, but there are also mobile applications. Here I describe 3 of them.
- CACM providing access to the monthly Communications of the ACM
- TechNews providing computer-science related news three times a week
- interactions providing access to interactions, bimonthly magazine for SIGCHI members
All are published under Google Play account Assoc. Computing Machinery but, as can be seen from their identifiers, were made by different companies.
All of them require providing ACM web account login and password to access content .
It provides access to news three times a week. There are usually 10 summaries with links to original articles. I find TechNews useful to keep myself up to date with trends in computer science. ACM sends TechNews as an email to its members, and provides
I do not use TechNews application very often. It shows the recent list of titles of mentioned articles and one can click on title and go to short text. This workflow is not very well-suiting for me. When I read TechNews from email I scroll and skim over all news. In application I would have to go over all the news, which takes more time than using email.
TechNews application could be more useful to browse archive. Unfortunately design here is rather bad. There are news every Monday, Wednesday, and Friday, except for when there is some holiday in the USA (ACM is located in New York). Unfortunately there is no way of knowing which day of the week we’ve chosen and whether there was newsletter sent on selected day without trying to fetch news – and failing. So after few tries I got discouraged. I do not use ability to bookmark interesting news – to do so I would need to interact with application more often.
Also, UI feels like it’s an iOS application. It might be OK on iOS, but on Android it feels alien and repulsive.
Writing this I realized that I might as well uninstall this application; I’ve started it maybe 3 times in the last year.
This is mobile version of magazine Communications of the ACM, the flagship publication of organization. It might be preferred way of reading CACM in electronic way. While I download PDFs with interesting articles, I do not like reading on the screen; I already spend too much time in front of computer. That’s why I also do not visit http://cacm.acm.org/.
Application does not display cover of the current issues so sometimes I have problems with getting to know which issue to read.
The first problem I had was with entering passwords. CACM has artificial limit on password length. It accepts only 15 characters, while ACM web account allows for 26 characters. Such lack of following policy is not very nice when trying to access content on mobile device.
The very first screen we see after logging in is not encouraging. It’s list of articles from the current issue, but it doesn’t feel like it’s magazine. It’s quite similar to the list of snippets from TechNews. Also many articles are just short pieces linking to web pages which means that I would need to be always online to use it. As a Luddite I disconnect my phone from the network when I’m not using it. Another problem with following the links is that (just like TechNews) application seems to be coming from iOS and not using Android technologies. Instead of using Application Chooser when following links, it opens its own embedded browser. It means that I do not have access to my saved passwords, and I cannot save bookmarks.
Embedded browser fails when trying to render HTML. It seems to have problems with displaying pages and zooming. It is not scrolled page – it was displayed like this, with half of content cut. I was able to scroll right, but not left.
Articles are presented as web pages. Instead of providing images or
tables in the text, or after clicking, they are located at the bottom so
one needs to scroll there, look at them, and scroll back – manually.
Locating tables outside of main text makes sense in paper magazine, but
not in special application, where user is able to click, but have not
have many clues for one’s location in the text.
I use it most often from all the applications I describe. It feels like the real magazine as one can see covers of magazines. It also offers ability to download issues for offline reading – downloaded ones are marked with the green triangle. It also offers two modes of reading: like magazine where pages are shown exactly like on paper, and web-page like. I find the latter nice while the former unusable – but maybe it would look better on large tablets.
I like yellow marking of active elements; after displaying new page application highlights elements which will respond to click with yellow. It shows to the user that presented content is interactive, that it’s not just scanned paper magazine.
There are problems though. Even though application caches issues for offline reading, images for the non-magazine layout sometimes go missing. They are displayed on the magazine layout and when online but are missing when one is offline. . It makes offline mode less usable.
There are other problems with offline mode. When applications sits for few hours in background in offline mode it requires refreshing credentials. It does not check that device is offline and there is no possibility to connect to the server. But hitting Back few times goes to the main screen and one is able to use application again, without need to login. Sometimes application ignores its cached content and behaves as it was started for the first time. In such a case one needs to connect – and after that application can again access downloaded data without any problems.
Application has problems with displaying pages in magazine layout; sometimes instead of displaying pages it display space between pages.
Again, just like CACM, interactions uses embedded browser instead of letting user pick one. This is especially funny when there is a link to YouTube, Vimeo or other video site. Embedded browser cannot cope with YouTube movies, so it’s is more frustrating than when reading paper magazine, where it’s natural that we cannot see the movie .
It’s good that companies and organizations are providing mobile applications. But applications should provide more than web pages. For now TechNews is just like mobile page, but in its own sandbox.
Applications should also integrate with the platform. Both CACM and interactions behave like there were written for iOS and then ported to Android without taking platform specificity into consideration. Using non-standards icons for sharing content and embedding browser instead of using system one brings feeling that something is wrong.
Applications also should feel like they are really part of company. Although CACM and interactions both are supposed to present magazines, they are completely different. They differ in how they present content, how they allow for browsing for archival issues, whether they allow for offline access. Lessons from interactions were not incorporated into CACM.
Applications also should integrate with environment. Both applications provide content from Digital Library and require logging. But when one saves bookmark or article there is no integration with Digital Library personal bookshelves.
Basically it looks like each application is its own serfdom. They are written by different companies (which can be seen from their IDs) and there is no knowledge transfer between them. There is no one person, committee or group in ACM responsible for mobile content. Described applications are published under account Assoc. for Computing Machinery. Recently there appeared application allowing for access to Digital Library (thus duplicating part of functionality of two described applications) from separate account Association for Computing Machinery. I find it strange, confusing, and meaning that nobody at ACM is able to deal with this mess.
To keep myself up to date I like to watch presentations from various conferences. Some time ago watched two keynotes: one from AWS re:Invent 2013, and another from Samsung Developers Conference. Both conferences were intended for developers to know new offerings of the companies, so keynotes were presenting new products and SDKs, and both included partners using mentioned SDKs in their own products.
Werner Vogels, Amazon CTO, presented re:Invent keynote. He presented interesting products: inclusion of PostgreSQL into Amazon RDS (finally!), Kinesis – new tool for analysing streams of data and CloudTrail, giving ability to record all AWS API calls into S3, allowing for better auditing of operations in the cloud.
But there was one moment which raised my hair. At 1:22:55 Vogels pointed to something he was wearing on his suit, and informed everyone that it is Narrative Clip made by company from Sweden – a camera which takes photo every 30 seconds and uploads it into Amazon S3. It is interesting usage of technology and I can see why he was eager to show it.
But Vogels told that he was wearing it all the time at the conference, during preparing of his talk, when talking with people, and so on. And this is when I felt strong disagreement with his eagerness to wear it. I felt as he betrayed trust of all the people who interacted with him. I know that at the conference there is no expectation of privacy, with everyone making photos, press teams making videos and promotional clips, and anyone able to overhear each other conversations. But in my opinion this is different. There is difference between having conversation and someone hearing it, and having conversation where other party records it. The latter brings lack of trust. There is reason why those are called “private conversations”. I’m sad that we, so rushing to try new technological gadget, like this Narrative Clip or Google Glass, seem to lose this trust in interpersonal relationships. Knowing that what I say and how I look could be exported to the cloud for all the worlds (or at least all the governments) to see means that I’ll not be sincere and instead of telling what I mean I’ll be thinking how what I say might be used against me now – or in a few years time. This is basically as if all the time I would be under Miranda warning – “everything what you say (or do) might be used against you”, and not only in official situations, but in (supposedly) innocent talk with other person.
Samsung keynote was presented by 6 to 8 Vice Presidents from Samsung (I lost count), and people from partner companies. Lack of one main presenter and trying to squeeze many unrelated products into one talk meant that I had no feeling of continuity I had watching re:Invent keynote.
This keynote also brought some privacy-related concerns, caused by Eric Edward Andersen, Vice President for Smart TV, presenting Smart TV SDK 5.0. He started his part of talk talking about emotional connection, about emotions related to interacting with content on TV screen. Then he presented new TV with quad-core CPU, which is apparently needed because “it’s (TV) is learning from your behaviour”. Do I really want for my TV to learn my behaviours? All the existing technologies assume that my taste is constant, and as soon as technology learns my behaviour and what I like it can start showing me what it suspects I like. But what about discovering new things? What about growing in life? YouTube tries to propose me some things it considers I might find interesting. One of the problems with it is that it tends to stick with some things I watched in the past. There was channel I was watching for some time and then stopped – but YouTube still puts it into proposals, after few months. At the same note, Google integration of services is really scary. I opened page about anime using Chrome (not my usual browser) and now YouTube proposes my anime to watch. OK, I might even find it interesting, but why it proposes those anime in italian?
Possible privacy violation was mentioned later, at 39:06. Andersen has shown some numbers for how long people are interacting with different applications on their smart TVs, for example how long Hulu or Netflix sessions are. I think the main idea was to show programmers that people are spending much time in front of TV, interacting with different applications and consuming content, so it would be wise to write software for smart TVs. But I had different feeling. Samsung having this data means that TV sends back information about usage to the mothership; Andersen mentioning how many people are “activating” their TVs seems to confirm this. LG was accused that their TVs are spying on users and sending data to the company; it looks like Samsung does something similar.
After seeing this, I am left wondering what is the advantage of smart TV? Why would one want to buy such TV to have it spying all the time? Orwell described quite well modern “Smart TV” in novel 1984 – he called them telescreens. Only inner-party members were able to turn off telescreens, and even they could not be sure whether device is still spying on them.
Another part of the presentation was given by Injong Rhee, Senior Vice President for Enterprise Mobile Communication Business. He was talking about Samsung KNOX, solution to help with managing devices for companies needs. This part of the presentation starts at 1:15:37. Rhee describes history of making KNOX:
What I have done.. I took my team to the drawing board to start reengineering and redesigning security architecture of Android. That’s how Samsung KNOX is born.
We actually put security mechanisms in each of those layers
We have implemented property called Mandatory Access Control or MAC (..) Security Enhancements for Android
and then describes difference between MAC and traditional triplets owner/group/other and read/write/execute.
what we have done with the MAC is that we define which system resources the process can access
Then Rhee presents Dual Personas – availability to have separate user accounts on one device. This is also functionality available in Android – separate user accounts are available in Android 4.2, ability to add restrictions to accounts available in Android 4.3 (“Support for Restricted Profiles”).
It left me with strange feeling. I do not know what is so unique with KNOX, as it just seem to be a different name for features already available in Android 4.3 – and, what a coincidence – KNOX is also available for Samsung devices with Android 4.3. Samsung probably added some interesting features and functionalities in the KNOX (maybe ability to manage those policies by management), but presentation did not distinguish between features added by KNOX and available in pure Android. This seems strange presented by Rhee who presented himself as former university professor. As the former professor he should know how to give proper attribution, how to cite others’ work, and how to mention what is unique in his work.
I noticed another strange manner. Samsung seem to have opinion that good API is large. Of course having rich enough set of components not restricting programmer is the sign of good API. On the other hand overgrown API means that there is to many things to remember, and it makes programming harder than it should be. Rhee, when talking about KNOX, described it (1:25:55) as “KNOX API which covers over 1000 APIs or more”, with slide containing “KNOX SDK: 1090+ APIs for Remote Device Control”. What does it really mean!? API (Application Programming Interface) is one – and it is set of types, classes, structures, methods, and so on. What does Samsung means by API then?
It seems that Samsung engineers are pumping numbers just to be able to show impressive, overgrown numbers. Samsung seems to have troubles with having to many devices and to many versions to manage. They even have troubles with updating their own devices. Combined with “me too” attitude (e.g. promising to use 64-bit CPUs in mobile phones after Apple presents 64-bit iPhone) it does not bring confidence in their ability to develop presented technologies, and (for example) to keep their smart TVs up to date. Unlike phones, which are (at least in Poland) changed every 18 or 24 months, during signing new contracts, TVs are changed less often. And people will grow disappointed when there is no update to their TVs, and each month something will stop working: YouTube changes video codecs and you cannot watch movies from the internet, Skype changes protocol and suddenly you cannot call people, and so on. Basically “smart” appliances need much more after-sale care than dumb ones, and companies (except for Apple which provides updates for their phones far longer than other phone manufacturers) do not seem to realize this.
Although there are some trends I strongly disagree with I’m glad that I have watched those keynotes. We definitely live in fast-paced times and although I’ve stopped trying to catch up with all the new technologies I think it is important to keep eye on what is proposed by various companies.
Cloud computing is gaining momentum. Debian has own team, Debian Cloud Team, created during the last DebConf in Switzerland, with Alioth page and mailing list. Team’s description is “We work to ensure that Debian, the Universal operating system, works well in public, private, and hybrid clouds.”
To ensure proper usage of Debian on the cloud we need to solve two main issues: we need to be able to create system image used by virtual machine, and we need to be able to configure virtual machine when it is started.
Every system needs configuration; it is usually done during installation and after first system run. During configuration we generate user accounts and passwords and decide which packages to install. After starting system we configure installed programs, deploy data (e.g. for web server) and so on.
Similar needs exist for cloud running machines. However, we do not install systems on virtual machines individually but use one of the available images. We also do not have console access to systems running in the cloud. We need to put SSH keys on the machine to ensure that we have SSH access to that machine. Also, as cloud usage usually means that we are running machines in large quantities, we should configure machine and its programs automatically. It is useful when machines are started and stopped without our intervention, e.g. for auto scaling, or when running machine as recovery after other one crashed.
There is also configuration related to the cloud. We might want to configure set of repositories used to install or update packages. It might be good idea to use specific repositories, e.g. provided by cloud providers, so we do not reach outside (thus avoiding costs related to network transfer) and maybe so we use packages provided by cloud provider, e.g. with kernels and drivers specific to hardware or virtualization solution used to run images on.
Most distributions use cloud-init, set of scripts written in Python, to configure virtual machine when it starts. It can be used on different cloud providers and can be used with different Linux distributions. It allows for providing user-data to deploy to images, and to provide script which will be run during startup.
Everyone can create own custom images to run on the cloud. Amazon provides documentation how to create AMI (image files) to use for running on EC2:
There is Debian wiki page describing how to create AMI, based on the official documentation. The process of creating AMI manually is rather complicated, involving many steps; wiki page warns that this is work in progress and is unfinished.
Debian wiki also contains page describing creation of Debian Installer images on AWS. There is no script for automatic creation of those images yet. Page contains steps one needs to follow to create images, in manner similar to describe in paragraph above. Creation of such images require cloud-init.
Instead of creating images ourselves we can use images created by others – there is market for available images. Companies and organizations can provide images created by them. Such images are more trusted by users as they have been created and configured by organizations responsible for software contained om such an image.
There is set of official Debian cloud images, just like there is set of official CD and DVD images. The most advanced situation is for Amazon Web Services AWS EC2; James Bromberger was delegated by the Debian Project Leader to manage Debian images on AWS Marketplace. He also maintains list of current images and manages AWS account which serves as the owner of provided images.
Creating images manually is long task and it is better to have scripts to create images. Extending script to allow for configuration of created images allows for experimentation or providing different images. This is the role of ”build-debian-cloud”, script written in Python, intended to build Debian image to different cloud IaaS providers. Currently AwS and VirtualBox are supported as providers of cloud solutions.
The build-debian-cloud source is hosted on GitHub and is currently developed by Anders Ingemann, which currently work on WIP-python branch on cloud-init and HVM support. It is forked from repository started by http://www.camptocamp.com/ but this original repository is inactive since April 2013.
Script uses JSON manifest file for configuring built images and it uses JSON schema for validation of provided configuration manifest. It requires:
- euca2ools – set of scripts to access AWS; I’m not sure whether aws-cli can be used instead
- boto – Python module to access AWS
It is task-based system with tasks organized into modules which eases configuration of created images. It logs many aspects of image creation process to help with solving problems and to provide feedback on image building It also provides rollback to recover from problems during image creation.
Repository contains following files and directories, described in sections below:
- base – Python code managing building of images.
- build-debian-cloud – Simple script, calling main() from ”base”
- common – Python code used by base, plugins, providers.
- CONTRIBUTING.md – Tips for extending build-debian-cloud. Source is not fully following PEP8, for example it uses tabs and spaces and allows for 110 columns. One can use pep8 with following options disabled to check source code: E101, E211, E241, E501, W191.
- logs – Directory for logs generated during image creation.
- manifests – Files with manifests for building different cloud images.
- plugins – Directory with various plugins which can enable functionality in built images.
- providers – Directory with modules for building images for different cloud providers.
Script is still work in progress. For example it currently can only build PVM-based AMIs, now HVM – so images built by it cannot be used to run GPU instances.
It contains basis of functionality which then uses information from other directories. It exports:
- Mainfest from manifest.py
- Phase from phase.py
- Task from task,py
- main from main.py
log.py defines logging functionality, including classes ConsoleFormatter and FileFormatter.
main.py defines functions used by build-debian-cloud script. main() parses arguments (calling get_args()), setups logging, and calls run(). run() loads manifest (class Manifest) and prepares list of tasks (class TaskList) to use according to manifest, using list of available task, available plugins and providers. Then it creates BoostrapInformation object using manifest. Then it calls tasklist.run() to execute all tasks in appropriate order. In case of exception it rolls back changes using task list.
manifest-schema.json contains JSON schema, which is used by code in manifest.py to check validity of manifest used to build image. Manifest contains following sections:
- release – only wheezy for now
manifest.py defines class Manifest, used to manage manifest describing image. It loads (load()), validates (validate()), and parses (parse()) JSON file. load() minifies JSON using function from minify_json.py, and loads all providers and plugins used in manifest. validate() validates manifest according to main schema, and to schemas defined for modules and providers – which can alter JSON schema. parse() exposes JSON as object attributes:
task.py defines abstract class Task, which will be used to implement tasks performed during creating images. Child classes must implement the method run(). Each Task contains phase (class Phase, defined in phase.py, used for ordering) and list of predecessors and successors.
tasklist.py defines class TaskList, used to order Tasks. All Tasks are kept in set, created in load(). Method run() calls create_list() to create the list of tasks to run, and then runs each task and adds it to list tasks_completed. create_list() uses check_ordering() to check validity of phases of predecessors and successors, then it checks for cycles in graph of dependencies by finding strongly_connected_components(), and then calls topological_sort() to order tasks so they can be run.
Directory pkg contains definition of classes responsible for managing packages. exceptions.py defines two exception classes: PackageError and SourceError. sourcelist.py defines two classes: Source describing one source package (deb-src), and SourceList describing set of source packages. packagelist.py defines PackageList, managing list of binary packages.
Partitions and Volumes
Directory fs contains definitions of classes used to manage partitions in created images. exceptions.py defines two exceptions: VolumeError and PartitionError. volume.py defines class Volume, used as base class by all other classes. Volume contains methods _after_create() and _check_blocking() which are used in appropriate moments when creating image. It defines set of events it can respond to:
- create, changing state from nonexistent to detached
- attach, changing state from detached to attached
- detach, changing state from attached to detached
- delete, changing state from detached to deleted
Directory partitions contains definitions of classes used to manage partitions. abstract.py defines AbstractPartition class. It contains methods related to events from partition map lifetime: _before_format(), mount() and _before_mount(), _after_mount(), _before_unmount(), add_mount(), remove_mount(), get_uuid(). It also defines events it can respond to:
- create, changing state from nonexistent to created
- format, changing state from created to formatted
- mount, changing state from formatted to mounted
- unmount, changing state from mounted to formatted
base.py defines class BasePartition which adds more event-related methods: create(), get_index(), get_start(), map(), _before_map(), _before_unmap(). Those are needed to manage partitions in partition maps. It also changes available states and events:
- create, changing state from nonexistent to unmapped
- map, changing state from unmapped to mapped
- format, changing state from mapped to formatted
- mount, changing state from formatted to mounted
- unmount, changing state from mounted’ to formatted
- unmap, changing state from formatted to unmapped_fmt
- map, changing state from unmapped_fmt to formatted
- unmap, changing state from mapped to unmapped
Other files define concrete partitions:
- gpt.py defines GPTPartition inheriting from BasePartition
- gpt_swap.py defines GPTSwapPartition inheriting from GPTPartition
- mbr.py defines MBRParition inheriting from BasePartition
- mbr_swap.py defines MBRSwapParittion inheriting from MBRPartition
- single.py defines SinglePartition inheriting from AbstractPartition.
Directory partitionmaps contains definitions of classes used to manage disk volumes and their relations to partitions. abstract.py defines class AbstractPartitionMap, used as base for all other classes from this directory. It contains methods related to events from partition map lifetime: create() and _before_create(), map() and _before_map(), unmap() and _before_unmap(), and is_blocking() and get_total_size(). It defines set of events it can respond to, just like Volume class:
- create, changing state from nonexistent to unmapped
- map, changing state from unmapped to mapped
- unmap, changing state from mapped to unmapped
Other files contain definitions of concrete classes:
- none.py defines NoPartitions
- gpt.py defines GPTPartitionMap
- mbr.py defines MBRParitionMap
This directory contains code used by all other classes – main script, plugins, and providers. exceptions.py defines three exceptions: ManifestError, TaskListError, and TaskError.
fsm_proxy.py defines FSMProxy class, being base class for all volume- and partition-related classes. It contains methods responsible for event listeners and proxy methods. phases.py creates Phase objects and puts them in order array:
task_sets.py contains definition of available tasks, imported from common.tasks. All tasks are grouped into arrays holding related tasks:
tools.py contains functions related to logging.
Directory assets contains assets used during image creation. Currently it contains init.d directory.
Directory fs contains file-system-related classes; all such classes inherit from Volume defined in base.fs.volume.
Directory tasks contains definition of classes describing tasks, inheriting from base.task.Task. There is too many classes to describe, so I only list files:
Contains directories with plugins. Each directory is Python module. Each module contains __init__.py and tasks.py files; it might contain manifest-schema.json, README.md, and assets directory. File tasks.py contains definition of task classes provided by the plugin, inheriting from base.task.Task.
Directory contains available cloud providers – targets of generated images. Currently there are two providers:
Each directory is Python module, containing __init__.py, manifest.py, and manifest-schema.json. It might also contain assets and tasks directories to define new tasks to be used when building image.
Directory contains manifests which can be used to create Debian images. Those manifests can also serve as examples which can be customized. Currently it contains:
Providing official Debian images for the cloud is as important as providing ISO images. Having scripts helping with this tasks means that we can do it easier and use saved time for other tasks, more developing and less housekeeping. If you are interested in the cloud join Debian Cloud Team.
Making sure that all packages are of necessary quality is hard work. That’s why there is freeze before releasing new Debian to make sure that there are no known release critical bugs.
There is “Collab QA” team whose role is “sharing results from QA tests (archive rebuilds, piuparts runs, and other static checks”. It is also present on Alioth where source code for various tools is hosted.
As noted in Collab QA description one of the teams responsibilities is rebuilding packages in Debian archive. Large rebuilds are needed to test new version of compiler (e.g. during transition from GCC 4.7 to 4.8) or when considering building package using LLVM-based compilers. Most of packages are built during uploading, but some QA and tests require rebuilding large parts of archive.
It requires large computational power. Thanks to the Amazon support, Debian can access EC2 to run some tasks, as noted by Lucas Nussbaum in his “bits from DPL – November 2013″.
Lucas Nussbaum (current Debian Project Leader) has wrote some scripts to rebuild and test packages on Amazon EC2. Scripts can be downloaded from git repository or cloned using git protocol. Scripts started as tool for rebuilding entire archive, and now they are used to test different versions of compilers (different gcc versions) and compiling of packages using clang. Currently Lucas is not actively developing the code; David Suarez and Sylvestre Ledru took that role.
Scripts are written in Ruby. I am not proficient in Ruby so please forgive any mistakes and misunderstanding.
Rebuilding is managed using one master node which always runs. While master nodes controls slave nodes it is not responsible for starting and stopping them. User is responsible for starting slave nodes (usually from own machine), and sending list of them to the master node. Default setup, described in README, uses 50 m1.medium nodes and 10 m2.xlarge nodes. Smaller nodes are used to compile small packages. Larger nodes are used to compile huge packages, needing much memory, like LibreOffice, x.org, etc.
Each slave node has one or more slots to deal with tasks; it means that it is possible to run tests in parallel, e.g. compile more than one package at the same time.
User is supposed to use AWS-CLI tools or other means to manage slave nodes. AWS-CLI is not yet part of Debian, although Taniguchi Takagi wants to package and upload them to Debian (Bug #733211). For the time being you can download AWS CLI source code from GitHub.
Spot instances are used to save on the costs. This is possible because compiling packages (especially for tests, not as the step during uploading package to Debian archive) is not time critical. It also is idempotent (we can compile package as many times as we want to) and it deals well with being stopped when spot instance is not available anymore.
All data is sent between nodes encoded using JSON. Using JSON allows for sending arrays and dictionaries, which means that is it easy to send the structure describing package, rebuild options, log, parameters, result, etc.
There is no communication between user machine and master node. User is supposed to SSH to master node, clone repository with scripts, and run scripts from inside of this repository. Master node communicates with slave nodes using SSH and SCP; it sends necessary scripts to slave nodes and then runs them.
Usual workflow is described in README:
- Request spot instances
- Wait for their start
- Connect to master node
- Prepare job description (list all packages to test)
- Run master script passing list of packages and list of nodes as arguments
- Wait for all tasks to finish
- Download result logs
- Stop all slave instances
JSON contains information about packages to compile. Each package is
described using following fields:
- type – Whether to test package compilation or installation (instest).
- package – Name of package to test.
- dist – Debian distribution to test on.
- esttime – Estimated time for performing test, used for building
- logfile – Name of file to write log to.
Repository contains many scripts, their names usually convey their jobs.
Scripts containing instest in their names are intended to test
installation or upgrade of packages.
- clean – removes all logs and JSON files describing tasks and slave nodes
- create-instest-chroots – creates chroot, debootstrap it, copies basic configuration, updates system, copies maintscripts; works with sid, squeeze, wheezy
- genereate-tasks-* – Scripts for generating JSON files describing tasks for master to distribute to slave nodes.
- genereate-tasks-instest – Read all packages from local repository and set them to test installation.
- genereate-tasks-rebuild – Read list of packages from Ultimate Debian Database, excluding some, create list. Allows for limiting packages based on their build time. Uses unstable chroot.
- genereate-tasks-rebuild-jessie – Script for build Jessie packages, using Jesse chroot.
- genereate-tasks-rebuild-wheezy – Script for build Wheezy packages, using Wheezy chroot.
- instest – Testing installation.
- masternode – Script run on master node, sending all tasks to slaves.
- merge-tasks – Merges JSON with description of tasks.
- process-task – Main script run on slave node.
- setup-ganglia – Installs Ganglia monitor on slave node, to monitoring it health.
- update – Updates chroot to newest versions.
It accepts files containing list of packages to test and list of slave nodes as command line arguments.
It connects to each slave node and uploads necessary scripts (instest, process-task) to them.
For each node it creates as many threads as there is slots; each thread opens one SSH and one SCP connection. Then each thread gets one task from task queue, and calls execute_one_task to process this task. Success is logged if task succeeds. Otherwise task is added to retry queue. If there is no tasks left in main queue, number of available slots on the slave node is decreased, and thread (except for the last one) ends.
The last thread for each node is responsible for dealing with failed tasks from retry queue. Again it loops for all available tasks, this time from retry queue, and calls execute_one_task for each of them. This time each task is run alone on the node, so problems caused by concurrent compilation (e.g. compiling PyCUDA and PyOpenCL with hardening options on machine with less than 4GB is problematic) should be solved. If task fails again it is not retried but only logged.
Script creates one additional thread which periodically (every minute) checks whether there are any tasks left in main and retry queues.
Script ends when all threads finish.
execute_one_task is simple function. It encodes task description into JSON and uploads JSON to slave node. Then it executes process_task on slave node and downloads log. It can also download built package from slave node and upload it to archive using reprepro script. Function returns whether test succeeded or not.
It is script run on slave node for each task. It reads JSON file with description of task passed as command line argument. If master node wants to test installation, it runs instest and exits. Otherwise it proceeds with testing package build.
Script can accept options governing package build process. For example it sets DEB_BUILDOPTIONS=parallel=10 when we want to test parallel build. It can also accept versions of compilers and libraries to use during compilation. Script sets repositories and package priorities to ensure that proper versions of build dependencies are used. Then script calls sbuild to build package, and checks whether estimation of time needed to perform test was correct.
It is used to test installation and update of package. It uses chroot to install package to.
Accepts chroot location and package to test as command line arguments. It cleans chroots and checks whether package is already installed. Script tests installation in various circumstances: it installs only dependencies or build dependencies, installs package, installs package and all packages recommended by it, installs package and all packages recommended and suggested by it. Script can also test upgrading package, to check whether there are some problems caused by upgrade.
There are some workarounds for MySQL and PostgreSQL; it looks like there are some problems with post-inst scripts (which try to connect to newly installed database) in those packages, so testing must take such failure into consideration.
Using cloud helps with running many tests in short time. Such tests can serve as QA tool and help for experimentation. Building packages in controlled environment, one which can easily be recreated and shut down allows for ensuring that packages are of good quality. At the same time ability to run many tests, and preparing different environments can help with experimentation, e.g. testing different compilers, configuration options, and so on.
Thanks for Amazon and to James Bromberger to providing grants allowing for Debian to use AWS and EC2 to perform such tests.