Two keynotes

To keep myself up to date I like to watch presentations from various conferences. Some time ago  watched two keynotes: one from AWS re:Invent 2013, and another from Samsung Developers Conference. Both conferences were intended for developers to know new offerings of the companies, so keynotes were presenting new products and SDKs, and both included partners using mentioned SDKs in their own products.

Werner Vogels, Amazon CTO, presented re:Invent keynote. He presented interesting products: inclusion of PostgreSQL into Amazon RDS (finally!), Kinesis – new tool for analysing streams of data and CloudTrail, giving ability to record all AWS API calls into S3, allowing for better auditing of operations in the cloud.

But there was one moment which raised my hair.  At 1:22:55 Vogels pointed to something he was wearing on his suit, and informed everyone that it is Narrative Clip made by company from Sweden – a camera which takes photo every 30 seconds and uploads it into Amazon S3. It is interesting usage of technology and I can see why he was eager to show it.

But Vogels told that he was wearing it all the time at the conference, during preparing of his talk, when talking with people, and so on. And this is when I felt strong disagreement with his eagerness to wear it. I felt as he betrayed trust of all the people who interacted with him. I know that at the conference there is no expectation of privacy, with everyone making photos, press teams making videos and promotional clips, and anyone able to overhear each other conversations. But in my opinion this is different. There is difference between having conversation and someone hearing it, and having conversation where other party records it. The latter brings lack of trust. There is reason why those are called “private conversations”. I’m sad that we, so rushing to try new technological gadget, like this Narrative Clip or Google Glass, seem to lose this trust in interpersonal relationships. Knowing that what I say and how I look could be exported to the cloud for all the worlds (or at least all the governments) to see means that I’ll not be sincere and instead of telling what I mean I’ll be thinking how what I say might be used against me now – or in a few years time. This is basically as if all the time I would be under Miranda warning – “everything what you say (or do) might be used against you”, and not only in official situations, but in (supposedly) innocent talk with other person.

Samsung keynote was presented by 6 to 8 Vice Presidents from Samsung (I lost count), and people from partner companies. Lack of one main presenter and trying to squeeze many unrelated products into one talk meant that I had no feeling of continuity I had watching re:Invent keynote.

This keynote also brought some privacy-related concerns, caused by Eric Edward Andersen, Vice President for Smart TV, presenting Smart TV SDK 5.0. He started his part of talk talking about emotional connection, about emotions related to interacting with content on TV screen. Then he presented new TV with quad-core CPU, which is apparently needed because “it’s (TV) is learning from your behaviour”. Do I really want for my TV to learn my behaviours? All the existing technologies assume that my taste is constant, and as soon as technology learns my behaviour and what I like it can start showing me what it suspects I like. But what about discovering new things? What about growing in life? YouTube tries to propose me some things it considers I might find interesting. One of the problems with it is that it tends to stick with some things I watched in the past. There was channel I was watching for some time and then stopped – but YouTube still puts it into proposals, after few months. At the same note, Google integration of services is really scary. I opened page about anime using Chrome (not my usual browser) and now YouTube proposes my anime to watch. OK, I might even find it interesting, but why it proposes those anime in italian?

Possible privacy violation was mentioned later, at 39:06. Andersen has shown some numbers for how long people are interacting with different applications on their smart TVs, for example how long Hulu or Netflix sessions are. I think the main idea was to show programmers that people are spending much time in front of TV, interacting with different applications and consuming content, so it would be wise to write software for smart TVs. But I had different feeling. Samsung having this data means that TV sends back information about usage to the mothership; Andersen mentioning how many people are “activating” their TVs seems to confirm this. LG was accused that their TVs are spying on users and sending data to the company; it looks like Samsung does something similar.

After seeing this, I am left wondering what is the advantage of smart TV? Why would one want to buy such TV to have it spying all the time? Orwell described quite well modern “Smart TV” in novel 1984 – he called them telescreens. Only inner-party members were able to turn off telescreens, and even they could not be sure whether device is still spying on them.

Another part of the presentation was given by Injong Rhee, Senior Vice President for Enterprise Mobile Communication Business. He was talking about Samsung KNOX, solution to help with managing devices for companies needs. This part of the presentation starts at 1:15:37. Rhee describes history of making KNOX:

What I have done.. I took my team to the drawing board to start reengineering and redesigning security architecture of Android. That’s how Samsung KNOX is born.

and

We actually put security mechanisms in each of those layers

and

We have implemented property called Mandatory Access Control or MAC (..) Security Enhancements for Android

and then describes difference between MAC and traditional triplets owner/group/other and read/write/execute.

what we have done with the MAC is that we define which system resources the process can access

Basically it sounds like ordinary Security Enhanced Linux, available in Android since 4.3 (“Android sandbox reinforced with SELinux”)

Then Rhee presents Dual Personas – availability to have separate user accounts on one device. This is also functionality available in Android – separate user accounts are available in Android 4.2, ability to add restrictions to accounts available in Android 4.3 (“Support for Restricted Profiles”).

It left me with strange feeling. I do not know what is so unique with KNOX, as it just seem to be a different name for features already available in Android 4.3 – and, what a coincidence – KNOX is also available for Samsung devices with Android 4.3. Samsung probably added some interesting features and functionalities in the KNOX (maybe ability to manage those policies by management), but presentation did not distinguish between features added by KNOX and available in pure Android. This seems strange presented by Rhee who presented himself as former university professor. As the former professor he should know how to give proper attribution, how to cite others’ work, and how to mention what is unique in his work.

I noticed another strange manner. Samsung seem to have opinion that good API is large. Of course having rich enough set of components not restricting programmer is the sign of good API. On the other hand overgrown API means that there is to many things to remember, and it makes programming harder than it should be. Rhee, when talking about KNOX, described it (1:25:55) as “KNOX API which covers over 1000 APIs or more”, with slide containing “KNOX SDK: 1090+ APIs for Remote Device Control”. What does it really mean!? API (Application Programming Interface) is one – and it is set of types, classes, structures, methods, and so on. What does Samsung means by API then?

It seems that Samsung engineers are pumping numbers just to be able to show impressive, overgrown numbers. Samsung seems to have troubles with having to many devices and to many versions to manage. They even have troubles with updating their own devices. Combined with “me too” attitude (e.g. promising to use 64-bit CPUs in mobile phones after Apple presents 64-bit iPhone) it does not bring confidence in their ability to develop presented technologies, and (for example) to keep their smart TVs up to date. Unlike phones, which are (at least in Poland) changed every 18 or 24 months, during signing new contracts, TVs are changed less often. And people will grow disappointed when there is no update to their TVs, and each month something will stop working: YouTube changes video codecs and you cannot watch movies from the internet, Skype changes protocol and suddenly you cannot call people, and so on. Basically “smart” appliances need much more after-sale care than dumb ones, and companies (except for Apple which provides updates for their phones far longer than other phone manufacturers) do not seem to realize this.

Although there are some trends I strongly disagree with I’m glad that I have watched those keynotes.  We definitely live in fast-paced times and although I’ve stopped trying to catch up with all the new technologies I think it is important to keep eye on what is proposed by various companies.

Advertisements

Debian Cloud Images

Cloud computing is gaining momentum. Debian has own team, Debian Cloud Team, created during the last DebConf in Switzerland, with Alioth page and mailing list. Team’s description is “We work to ensure that Debian, the Universal operating system, works well in public, private, and hybrid clouds.”

To ensure proper usage of Debian on the cloud we need to solve two main issues: we need to be able to create system image used by virtual machine, and we need to be able to configure virtual machine when it is started.

cloud-init

Every system needs configuration; it is usually done during installation and after first system run. During configuration we generate user accounts and passwords and decide which packages to install. After starting system we configure installed programs, deploy data (e.g. for web server) and so on.

Similar needs exist for cloud running machines. However, we do not install systems on virtual machines individually but use one of the available images. We also do not have console access to systems running in the cloud. We need to put SSH keys on the machine to ensure that we have SSH access to that machine. Also, as cloud usage usually means that we are running machines in large quantities, we should configure machine and its programs automatically. It is useful when machines are started and stopped without our intervention, e.g. for auto scaling, or when running machine as recovery after other one crashed.

There is also configuration related to the cloud. We might want to configure set of repositories used to install or update packages. It might be good idea to use specific repositories, e.g. provided by cloud providers, so we do not reach outside (thus avoiding costs related to network transfer) and maybe so we use packages provided by cloud provider, e.g. with kernels and drivers specific to hardware or virtualization solution used to run images on.

Most distributions use cloud-init, set of scripts written in Python, to configure virtual machine when it starts. It can be used on different cloud providers and can be used with different Linux distributions. It allows for providing user-data to deploy to images, and to provide script which will be run during startup.

cloud-init is developed on Launchpad and source code is kept in Bazaar repository. It has good documentation. All files are kept in /var/lib/cloud; configuration is kept as YAML files.

Debian contains cloud-init package. Because of some problems with packaging cloud-init was not included in Wheezy (latest stable release) but it is available in Wheezy backports.

Image creation

Everyone can create own custom images to run on the cloud. Amazon provides documentation how to create AMI (image files) to use for running on EC2:

There is Debian wiki page describing how to create AMI, based on the official documentation. The process of creating AMI manually is rather complicated, involving many steps; wiki page warns that this is work in progress and is unfinished.

Debian wiki also contains page describing creation of Debian Installer images on AWS. There is no script for automatic creation of those images yet. Page contains steps one needs to follow to create images, in manner similar to describe in paragraph above. Creation of such images require cloud-init.

Instead of creating images ourselves we can use images created by others – there is market for available images. Companies and organizations can provide images created by them. Such images are more trusted by users as they have been created and configured by organizations responsible for software contained om such an image.

There is set of official Debian cloud images, just like there is set of official CD and DVD images. The most advanced situation is for Amazon Web Services AWS EC2; James Bromberger was delegated by the Debian Project Leader to manage Debian images on AWS Marketplace. He also maintains list of current images and manages AWS account which serves as the owner of provided images.

build-debian-cloud

Creating images manually is long task and it is better to have scripts to create images. Extending script to allow for configuration of created images allows for experimentation or providing different images. This is the role of ”build-debian-cloud”, script written in Python, intended to build Debian image to different cloud IaaS providers. Currently AwS and VirtualBox are supported as providers of cloud solutions.

The build-debian-cloud source is hosted on GitHub and is currently developed by Anders Ingemann, which currently work on WIP-python branch on cloud-init and HVM support. It is forked from repository started by http://www.camptocamp.com/ but this original repository is inactive since April 2013.

Script uses JSON manifest file for configuring built images and it uses JSON schema for validation of provided configuration manifest. It requires:

  • debootstrap
  • parted
  • grub2
  • jsonschema
  • qemu-utils
  • euca2ools – set of scripts to access AWS; I’m not sure whether aws-cli can be used instead
  • boto – Python module to access AWS

It is task-based system with tasks organized into modules which eases configuration of created images. It logs many aspects of image creation process to help with solving problems and to provide feedback on image building It also provides rollback to recover from problems during image creation.

Repository contains following files and directories, described in sections below:

  • base – Python code managing building of images.
  • build-debian-cloud – Simple script, calling main() from ”base”
  • common – Python code used by base, plugins, providers.
  • CONTRIBUTING.md – Tips for extending build-debian-cloud. Source is not fully following PEP8, for example it uses tabs and spaces and allows for 110 columns. One can use pep8 with following options disabled to check source code: E101, E211, E241, E501, W191.
  • logs – Directory for logs generated during image creation.
  • manifests – Files with manifests for building different cloud images.
  • plugins – Directory with various plugins which can enable functionality in built images.
  • providers – Directory with modules for building images for different cloud providers.
  • README.md

Script is still work in progress. For example it currently can only build PVM-based AMIs, now HVM – so images built by it cannot be used to run GPU instances.

base

It contains basis of functionality which then uses information from other directories. It exports:

  • Mainfest from manifest.py
  • Phase from phase.py
  • Task from task,py
  • main from main.py

log.py defines logging functionality, including classes ConsoleFormatter and FileFormatter.

main.py defines functions used by build-debian-cloud script. main() parses arguments (calling get_args()), setups logging, and calls run()run() loads manifest (class Manifest) and prepares list of tasks (class TaskList) to use according to manifest, using list of available task, available plugins and providers. Then it creates BoostrapInformation object using manifest. Then it calls tasklist.run() to execute all tasks in appropriate order. In case of exception it rolls back changes using task list.

manifest-schema.json contains JSON schema, which is used by code in manifest.py to check validity of manifest used to build image. Manifest contains following sections:

  • provider
  • bootstapper
    • mirror
    • workspace
    • tarball
  • system
    • release – only wheezy for now
    • architecture
    • bootloader
    • timezone
    • locale
    • charmap
  • packages
    • mirror
    • sources
    • remote
    • local
  • volume
  • plugins

manifest.py defines class Manifest, used to manage manifest describing image. It loads (load()), validates (validate()), and parses (parse()) JSON file. load() minifies JSON using function from minify_json.py, and loads all providers and plugins used in manifest. validate() validates manifest according to main schema, and to schemas defined for modules and providers – which can alter JSON schema. parse() exposes JSON as object attributes:

  • provider
  • bootstrapper
  • image
  • volume
  • system
  • packages
  • plugins

task.py defines abstract class Task, which will be used to implement tasks performed during creating images. Child classes must implement the method run(). Each Task contains phase (class Phase, defined in phase.py, used for ordering) and list of predecessors and successors.

tasklist.py defines class TaskList, used to order Tasks. All Tasks are kept in set, created in load(). Method run() calls create_list() to create the list of tasks to run, and then runs each task and adds it to list tasks_completedcreate_list() uses check_ordering() to check validity of phases of predecessors and successors, then it checks for cycles in graph of dependencies by finding strongly_connected_components(), and then calls topological_sort() to order tasks so they can be run.

Directory pkg contains definition of classes responsible for managing packages. exceptions.py defines two exception classes: PackageError and SourceError. sourcelist.py defines two classes: Source describing one source package (deb-src), and SourceList describing set of source packages. packagelist.py defines PackageList, managing list of binary packages.

Partitions and Volumes

Directory fs contains definitions of classes used to manage partitions in created images. exceptions.py defines two exceptions: VolumeError and PartitionError. volume.py defines class Volume, used as base class by all other classes. Volume contains methods _after_create() and _check_blocking() which are used in appropriate moments when creating image. It defines set of events it can respond to:

  • create, changing state from nonexistent to detached
  • attach, changing state from detached to attached
  • detach, changing state from attached to detached
  • delete, changing state from detached to deleted

Directory partitions contains definitions of classes used to manage partitions. abstract.py defines AbstractPartition class. It contains methods related to events from partition map lifetime: _before_format(), mount() and _before_mount(), _after_mount(), _before_unmount(), add_mount(), remove_mount(), get_uuid(). It also defines events it can respond to:

  • create, changing state from nonexistent to created
  • format, changing state from created to formatted
  • mount, changing state from formatted to mounted
  • unmount, changing state from mounted to formatted

base.py defines class BasePartition which adds more event-related methods: create()get_index()get_start()map()_before_map()_before_unmap(). Those are needed to manage partitions in partition maps. It also changes available states and events:

  • create, changing state from nonexistent to unmapped
  • map, changing state from unmapped to mapped
  • format, changing state from mapped to formatted
  • mount, changing state from formatted to mounted
  • unmount, changing state from mounted’ to formatted
  • unmap, changing state from formatted to unmapped_fmt
  • map, changing state from unmapped_fmt to formatted
  • unmap, changing state from mapped to unmapped

Other files define concrete partitions:

  • gpt.py defines GPTPartition inheriting from BasePartition
  • gpt_swap.py defines GPTSwapPartition inheriting from GPTPartition
  • mbr.py defines MBRParition inheriting from BasePartition
  • mbr_swap.py defines MBRSwapParittion inheriting from MBRPartition
  • single.py defines SinglePartition inheriting from AbstractPartition.

Directory partitionmaps contains definitions of classes used to manage disk volumes and their relations to partitions. abstract.py defines class AbstractPartitionMap, used as base for all other classes from this directory. It contains methods related to events from partition map lifetime: create() and  _before_create()map() and _before_map()unmap() and _before_unmap(), and is_blocking() and get_total_size(). It defines set of events it can respond to, just like Volume class:

  • create, changing state from nonexistent to unmapped
  • map, changing state from unmapped to mapped
  • unmap, changing state from mapped to unmapped

Other files contain definitions of concrete classes:

  • none.py defines NoPartitions
  • gpt.py defines GPTPartitionMap
  • mbr.py defines MBRParitionMap

common

This directory contains code used by all other classes – main script, plugins, and providers. exceptions.py defines three exceptions: ManifestError, TaskListError, and TaskError.

fsm_proxy.py defines FSMProxy class, being base class for all volume- and partition-related classes. It contains methods responsible for event listeners and proxy methods. phases.py creates Phase objects and puts them in order array:

  1. preparation
  2. volume_creation
  3. volume_preparation
  4. volume_mounting
  5. os_installation
  6. package_installation
  7. system_modification
  8. system_cleaning
  9. volume_unmounting
  10. image_registration
  11. cleaning

task_sets.py contains definition of available tasks, imported from common.tasks. All tasks are grouped into arrays holding related tasks:

  • base_set
  • volume_set
  • partitioning_set
  • boot_partition_set
  • mounting_set
  • ssh_set
  • apt_set
  • locale_set
  • bootloader_set

tools.py contains functions related to logging.

Directory assets contains assets used during image creation. Currently it contains init.d directory.

Directory fs contains file-system-related classes; all such classes inherit from Volume defined in base.fs.volume.

Directory tasks contains definition of classes describing tasks, inheriting from base.task.Task. There is too many classes to describe, so I only list files:

  • apt.py
  • boot.py
  • bootstrap.py
  • cleanup.py
  • development.py
  • filesystem.py
  • host.py
  • initd.py
  • locale.py
  • loopback.py
  • network.py
  • packages.py
  • partitioning.py
  • security.py
  • volume.py
  • workspace.py

plugins

Contains directories with plugins. Each directory is Python module. Each module contains __init__.py and tasks.py files; it might contain manifest-schema.json, README.md, and assets directory. File tasks.py contains definition of task classes provided by the plugin, inheriting from base.task.Task.

Available modules:

  • admin_user
  • build_metadata
  • cloud_init
  • image_commands
  • minimize_size
  • opennebula
  • prebootstrapped
  • root_password
  • unattended_upgrades
  • vagrant

providers

Directory contains available cloud providers – targets of generated images. Currently there are two providers:

  • ec2
  • virtualbox

Each directory is Python module, containing __init__.py, manifest.py, and manifest-schema.json. It might also contain assets and tasks directories to define new tasks to be used when building image.

manifests

Directory contains manifests which can be used to create Debian images. Those manifests can also serve as examples which can be customized. Currently it contains:

  • ec2-ebs-debian-official-amd64-hvm.manifest.json
  • ec2-ebs-debian-official-amd64-pvm.manifest.json
  • ec2-ebs-debian-official-i386-pvm.manifest.json
  • ec2-ebs-partitioned.manifest.json
  • ec2-ebs-single.manifest.json
  • ec2-s3.manifest.json
  • virtualbox.manifest.json
  • virtualbox-vagrant.manifest.json

Summary

Providing official Debian images for the cloud is as important as providing ISO images. Having scripts helping with this tasks means that we can do it easier and use saved time for other tasks, more developing and less housekeeping. If you are interested in the cloud join Debian Cloud Team.