Changes between Version 17 and Version 18 of VirtualBox


Ignore:
Timestamp:
Oct 20, 2009, 4:55:25 AM (15 years ago)
Author:
dgquintas
Comment:

Updated to latest version

Legend:

Unmodified
Added
Removed
Modified
  • VirtualBox

    v17 v18  
    55== "Logistic" advantages ==
    66
    7  1. One order of magnitude lighter, both its installation package (~35 MB) and
    8  its installed size (~60 MB). Compare with the 500+ MB of VMWare Server 2.0,
     7 1. One order of magnitude lighter, both its installation package (~70 MB) and
     8 its installed size (~80 MB). Compare with the 500+ MB of VMWare Server 2.0,
    99 that increase in some 150 extra MB when installed.
    1010 1. License. Its OSE (Open Source Edition) is published under the GPL v.2, but
    1111 even the non-libre version -PUEL,
    12  [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation
    13  License]- could be used for our purposes, but that's something to be checked
    14  by someone who actually knows something about licensing, unlike myself.
     12 [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation License]- could be used for our purposes, but that's something to be checked
     13 by someone who actually knows something about licensing, unlike myself.
    1514 1. Faster and "less painful" installation process, partly due to its lighter
    1615 weight. No license number required, hence less hassle for the user.
     
    2019The interaction with the VM is made possible even from the command line, in
    2120particular from the single command `VBoxManage` (extensive doc available in
    22 [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual]). Of
    23 particular interest for us are the following VBoxManager's arguments:
     21[http://www.virtualbox.org/manual/UserManual.html the manual]). The following VBoxManager arguments are particularly interesting :
    2422    - startvm
    2523    - controlvm  pause|resume|reset|poweroff|savestate ...
     
    2927    - registervm
    3028
    31 All the functionalities exposed by this command are also available throughout
    32 a C++ COM/XPCOM based API, as well as Python bindings. However, the `VBoxManage`
    33 is already ported to several platforms and it's flexible enough as to be relied on
    34 to interact with !VirtualBox.
     29All the functionality exposed by this command is also available through
     30a C++ COM/XPCOM based API, Python bindings and SOAP based web services.
    3531
    3632Following the capabilities enumeration introduced by Kevin, !VirtualBox would
    3733compare to his analysis based on VMWare Server as follows:
    3834
    39  1. Manage the Image.  Covered by the "`snapshot`" command 
    40  1. Boot the virtual machine. Covered by "`startvm`" 
     35 1. Manage the Image.  Covered by the "`snapshot`" command
     36 1. Boot the virtual machine. Covered by "`startvm`"
    4137 1. Copy files host -> guest: '''Not''' directly supported by the !VirtualBox API.
    4238 We'd need to resource to external solutions
    4339 such as the one detailed below based on [http://www.cs.wisc.edu/condor/chirp/ Chirp].
    4440 1. Run a program on the guest. Same as 3.
    45  1. Pause and the guest. Covered by "`controlvm pause/resume`" 
     41 1. Pause and the guest. Covered by "`controlvm pause/resume`"
    4642 1. Retrieve files from the guest.  See 3 and 4, same situation.
    4743 1. Shutdown the guest Covered by "`controlvm poweroff`"
     
    4945
    5046== Bindings ==
    51 In case the direct usage of the `VBoxManage` command wouldn't be appropriate,
    52 it's possible to fallback to the low-level API.
    53 Both VMWare Server and !VirtualBox make available C/C++ APIs, as well as
    54 Python, with different levels of support -in case of VMWare, it's an
    55 unsupported project.  !VirtualBox's API is based on COM/XPCOM, and it's
    56 possible to implement a unified windows/linux approach based on the former
    57 technology. The actual code implementing the [http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage VBoxManage]
    58 command is a very good reference.
    59 Therefore, implementing a "hypervisor abstraction layer" is in principle
    60 feasible, with a common win/linux codebase both for VIX and !VirtualBox API.
     47Despite `VBoxManage` being an excellent debugging and testing tool, it's not enough for our purposes. We'll need access to some
     48deeper structures not made available to such a high level tool.
     49
     50The question now comes to which of the available bindings to use.
     51VirtualBox's API is ultimately based on COM/XPCOM. It'd be
     52possible to implement a unified windows/linux approach based on these technologies, as demonstrated by the aforementioned
     53[http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage VBoxManage] command. On the other hand, this isn't a simple task, full
     54of quirks and platform specific pitfalls (COM is used on Windows, whereas Linux and presumably MacOS X resource to XPCOM).
     55
     56The Python bindings sound promising. Unfortunately, they aren't distributed with most of the pre-built binaries available at the !VirtualBox webpage.
     57
     58We are left with the SOAP based web services. This is a sufficiently well known mechanism as to have proper support on the three supported systems. Moreover, the [http://dlc.sun.com/virtualbox/vboxsdkdownload.html VirtualBox SDK] includes a good deal of Python code tailored for interacting with it.
     59This is the way the current implementation has gone.
     60
    6161
    6262== Interacting with the VM Appliance ==
     
    8282== Introduction ==
    8383In previous sections, two limitations of the API offered by !VirtualBox
    84 were pointed out. Namely, the inability to directly support the 
    85 execution of command and file copying between the host and the guest. 
     84were pointed out. Namely, the inability to directly support the
     85execution of command and file copying between the host and the guest.
    8686While relatively straightforward solutions exist, notably the usage of SSH,
    8787they raise issues of their own: the guest needs to (properly) configure this
     
    9090Thus, the requirements for a satisfactory solution would include:
    9191
    92   * Minimal or no configuration required on the guest side. 
    93   * No assumptions on the network reachability of the guest. Ideally, 
     92  * Minimal or no configuration required on the guest side.
     93  * No assumptions on the network reachability of the guest. Ideally,
    9494    guests should be isolated from "the outside world" as much as possible.
    9595
     
    9797
    9898  * Scalability. The solution should account for the execution of an arbitrary
    99     number of guests on a given host. 
     99    number of guests on a given host.
    100100  * Technology agnostic: dependencies on any platform/programming
    101101    language/hypervisor should be kept to a minimum or avoided altogether.
     
    103103
    104104== Proposed Solution ==
    105 Following Predrag Buncic's advice, I began looking into such a solution based on
    106 asynchronous message passing. In order to keep the footprint, both on the host and the guest sides,
    107 the [http://stomp.codehaus.org/Protocol STOMP protocol]
    108 came to mind. The protocol is simple enough as to have implementations in a
    109 large number of programming languages, while fulfilling all flexibility needs. Despite its
     105A very promising solution based on asynchronous message passing was proposed by Predrag Buncic.
     106The lightweight [http://stomp.codehaus.org/Protocol STOMP protocol] has been considered, in order
     107to incur on a small footprint. This protocol is simple enough as to have implementations in a
     108large number of programming languages, while still fulfilling all flexibility needs. Despite its
    110109simplicity and being relatively unheard of, ActiveMQ supports it out-of-the-box (even though
    111110it'd be advisable to use something lighter for a broker).
     
    113112Focusing on the problem at hand, we need to tackle the following problems:
    114113
    115   * Command execution on the guest
     114  * Command execution on the guest (+ resource usage accounting for proper crediting).
    116115  * File transfer from the host to the guest
    117116  * File transfer from the guest to the host
     
    125124  host and the guests need to share some knowledge about the broker's location, if it's going
    126125  to be running on an independent machine. Otherwise, it can be assumed that it listens on the
    127   host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism 
    128   is put in place in the host in order to route the connections to the broker. 
     126  host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism
     127  is put in place in the host in order to route the connections to the broker.
    129128 
    130   The recent release of the 2.2 series of !VirtualBox is a very convenient one: the newly introduced
    131   host-only networking feature fits our needs like a glove. From
    132   [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual] (section 6.7):
     129  The addition, in version 2.2, of the host-only networking feature was really convenient. From
     130  [http://www.virtualbox.org/manual/UserManual.html#network_hostonly the relevant section] of the manual:
    133131
    134132    Host-only networking is another networking mode that was added with version 2.2
     
    146144    virtual machines cannot be seen, the traffic on the “loopback” interface on the host
    147145    can be intercepted.
    148     
     146   
    149147  That is to say, we have our own virtual "ethernet network". On top of that, !VirtualBox
    150148  provides an easily configurable DHCP server that makes it possible to set a fixed IP for the
     
    158156  message passing infrastructure: a tailored message addressed to the guest we want to
    159157  run the command on is published, processed by this guest and eventually answered back
    160   with some sort of status (maybe even periodically in order to feedback about progress). 
     158  with some sort of status (maybe even periodically in order to feedback about progress).
    161159
    162160  Given the subscription-based nature of the system, several guests can be addressed at
    163161  once by a single host, triggering the execution of commands (or any other action
    164   covered by this mechanism) in a single go. Note that neither the hosts nor the 
     162  covered by this mechanism) in a single go. Note that neither the hosts nor the
    165163  (arbitrary number of) guests need to know how many of the latter conform the system:
    166164  new guest instances need only subscribe to these "broadcasted" messages on their own
     
    170168  === File Transfers ===
    171169  This is a trickier feature: transfers must be bidirectional, yet we want to avoid any kind
    172   of exposure or (complex) configuration. 
     170  of exposure or (complex) configuration.
    173171
    174172  The proposed solution takes advantage of the [http://www.cse.nd.edu/~ccl/software/chirp/ Chirp protocol and set of tools].
    175173  This way, we don't even require privileges to launch the server instances. Because
    176174  the file sharing must remain private, the chirp server is run on the guests. The host agent
    177   would act as a client that'd send or retrieve files. We spare ourselves from all the 
     175  would act as a client that'd send or retrieve files. We spare ourselves from all the
    178176  gory details involved in the actual management of the transferences, delegating the job
    179177  to chirp (which deals with it brilliantly, by the way).
    180178
    181   The only bit missing in this argumentation is that the host needs to be aware of the guests' 
    182   IP addresses in order to communicate with these chirp servers. This is a no-issue, as the 
     179  The only bit missing in this argumentation is that the host needs to be aware of the guests'
     180  IP addresses in order to communicate with these chirp servers. This is a no-issue, as the
    183181  custom STOMP-based protocol implemented makes it possible for the guests to "shout out" their
    184182  details so that the host can keep track of every single one of them.
     
    188186  * Where should the broker live? Conveniently on the same machine as the hypervisor or on
    189187    a third host? Maybe even a centralized and widely known (ie, standard) one? This last option
    190     might face congestion problems, though. 
    191   * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter? 
    192     (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this 
     188    might face congestion problems, though.
     189  * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter?
     190    (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this
    193191    question, unless a centralized broker is universally used, the lighter version largely suffices.
    194192    Otherwise, given the high load expected, a more careful choice should be made.
     
    201199changed: at least in the !VirtualBox case, no two disk images (globally) can
    202200have the same UUID. Luckily this can be quickfixed, taking into account we
    203 are looking for the following pattern: 
    204 
    205 {{{
    206 dgquintas@portaca:$ grep -n -a -m 1 "uuid.image" cernvm-1.2.0-x86.vmdk 
     201are looking for the following pattern:
     202
     203{{{
     204dgquintas@portaca:$ grep -n -a -m 1 "uuid.image" cernvm-1.2.0-x86.vmdk
    20720520:ddb.uuid.image="ef98873f-7954-4ed8-919a-aae7fb7443a8"
    208206}}}
     
    210208Notice the -m 1 flag, to avoid going through the many megabytes the file is
    211209worth. In place modifications of this UUID can be trivially performed in-place
    212 by using, for instance, sed. 
     210by using, for instance, sed.
    213211
    214212
     
    217215
    218216  === Overview ===
    219   Upon initialization, guests connect to the broker, that's expected to listen on the 
    220   default STOMP port 61613 at the guest's gateway IP. 
     217  Upon initialization, guests connect to the broker, that's expected to listen on the
     218  default STOMP port 61613 at the guest's gateway IP.
    221219  Once connected, it "shouts out" he's joined the party, providing a its unique id (see
    222220  following section for details). Upon reception, the BOINC host notes down this unique id for
    223   further unicast communication (in principle, other guests don't need this information). The 
     221  further unicast communication (in principle, other guests don't need this information). The
    224222  host acknowledges the new guest (using the STOMP-provided ack mechanisms).
    225223
    226224  Two channels are defined for the communication between host agent and VMs: the
    227225  connection and the command channels (this conceptual "channels" are actually
    228   a set of STOMP topics. Refer to [http://bitbucket.org/dgquintas/boincvm/src/tip/destinations.py the source] 
     226  a set of STOMP topics. Refer to [http://bitbucket.org/dgquintas/boincvm/src/tip/destinations.py the source]
    229227  for their actual string definition).
    230228
    231229
    232230  === Unique Identification of Guests ===
    233   The preferred way to identify guests is based simply on their IP.
     231  The preferred way to identify guests is by their name, as assigned by the hypervisor. This presents a problem, as they VMs themselves are internally unaware of their own name. A "common ground" is needed in order to work around this problem.
     232
     233The MAC address of the host-only virtual network card will be the common piece of data, unique and known by both the VM and hypervisor/host system, that will enable us to establish an unequivocal mapping between the VM and "the outside world". This MAC address is of course unique in the virtual network, ensured by !VirtualBox. It's available to the OS inside the VM has access to (as part of the properties of the virtual network interface), as well as through the VirtualBox API, completing the circle.
    234234
    235235  === VM Aliveness ===
     
    242242  The whole custom made protocol syntax is encapsulated in the
    243243  classes of the "words" package. Each of these words correspond
    244   to this protocol's commands, which are always encoded as 
     244  to this protocol's commands, which are always encoded as
    245245  the first single word of the exchanged STOMP messages.
    246246
     
    267267
    268268{{{
    269 BODY: 
     269BODY:
    270270  CMD_RUN
    271271}}}
     
    281281
    282282{{{
    283 BODY: 
     283BODY:
    284284  CMD_RESULTS <json-ed dict. of results>
    285285}}}
    286286
    287       This word requires a bit more explanation. 
     287      This word requires a bit more explanation.
    288288      Its body encodes the command execution results as
    289       a dictionary with the following keys: 
    290 
    291 {{{
    292 results: 
    293   { 
     289      a dictionary with the following keys:
     290
     291{{{
     292results:
     293  {
    294294    'cmd-id': same as in the word headers
    295295    'out': stdout of the command
     
    340340== API Accesibility ==
    341341The host agent functionalities are made accesible through a XML-RPC
    342 based API. This choice aims to provide a simple yet fully functional, 
     342based API. This choice aims to provide a simple yet fully functional,
    343343standard and multiplatform mechanism of communication between this
    344 agent and the outside world, namely the BOINC wrapper. 
     344agent and the outside world, namely the BOINC wrapper.
    345345
    346346
    347347== Dependencies ==
    348348This section enumerates the external packages (ie, not included in the
    349 standard python distribution) used. The version used during development 
     349standard python distribution) used. The version used during development
    350350is given in parenthesis.
    351351
    352   * [http://pypi.python.org/pypi/netifaces/0.5 Netifaces] (0.5) 
     352  * [http://pypi.python.org/pypi/netifaces/0.5 Netifaces] (0.5)
    353353  * [http://code.google.com/p/stomper/ Stomper] (0.2.2)
    354   * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires 
     354  * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires
    355355     [http://www.zope.org/Products/ZopeInterface Zope Interfaces] (3.5.1)
    356356  * [http://code.google.com/p/simplejson/ simplejson] (2.0.9). Note that this
     
    361361== Miscelaneous Features ==
    362362  * Multiplatform: it runs wherever a python runtime is available. All
    363   the described dependencies are likewise portable. 
     363  the described dependencies are likewise portable.
    364364  * Fully asynchronous. Thanks to the usage of the Twisted framework, the
    365   whole system developed is seamlessly multithreaded, even though no 
     365  whole system developed is seamlessly multithreaded, even though no
    366366  threads are used (in the developed code at least). Instead, all the
    367   operations rely on the asynchronous nature of the Twisted mechanism, 
    368   about which details are given 
     367  operations rely on the asynchronous nature of the Twisted mechanism,
     368  about which details are given
    369369  [http://twistedmatrix.com/projects/core/documentation/howto/async.html here].
    370370
     
    372372Because action speak louder than words, a prototype illustrating the previous
    373373points has been developed. Bear in mind that, while functional, this is a
    374 proof of concept and surely can be much improved. 
     374proof of concept and surely can be much improved.
    375375
    376376=== Structure ===
    377377[[Image(classDiagram.png)]]
    378378In the previous class diagram special attention should be paid to the classes
    379 of the "words" package: they encompass the logic of the implemented protocol. 
    380 The `Host` and `VM` classes model the host agent and the VMs, respectively. 
     379of the "words" package: they encompass the logic of the implemented protocol.
     380The `Host` and `VM` classes model the host agent and the VMs, respectively.
    381381Classes with a yellow background are support the underlying STOMP
    382 architecture. 
     382architecture.
    383383`CmdExecuter` deals with the bookkeeping involved in the execution of
    384384commands. `MsgInterpreter` takes care of routing the messages received by
     
    391391Several aspects can be configured, on three fronts:
    392392
    393 * Broker: 
     393* Broker:
    394394  * `host`: the host where the broker's running
    395395  * `port`: port the broker's listening on
     
    397397  * `password`: broker auth.
    398398
    399 * Host: 
     399* Host:
    400400  * `chirp_path`: absolute path (including /bin) of the chirp tools
    401401  * `xmlrpc_listen_on`: on which interface to listen for XML-RPC requests.
     
    407407
    408408  The configuration file follows
    409   [http://docs.python.org/library/configparser.html Python's !ConfigParser] syntax, and its latest
    410   version can be found 
     409  [http://docs.python.org/library/configparser.html Python's ConfigParser] syntax, and its latest
     410  version can be found
    411411  [http://bitbucket.org/dgquintas/boincvm/src/tip/config.cfg here].
    412412
    413413=== Download and Usage ===
    414 The current source code can be browsed as a 
     414The current source code can be browsed as a
    415415[http://bitbucket.org/dgquintas/boincvm/ mercurial repository], or downloaded from that same webpage.
    416 In addition, the packages described in [#Dependencies the dependencies
    417 section] must be installed as well.
     416In addition, the packages described in [#Dependencies the dependencies section] must be installed as well.
    418417
    419418Starting up the host agent amounts to:
    420419
    421420{{{
    422   dgquintas@portaca:~/.../$ python HostMain.py config.cfg 
     421  dgquintas@portaca:~/.../$ python HostMain.py config.cfg
    423422}}}
    424423
     
    431430Of course, a broker must be running on the host and port defined in the
    432431configuration file being used, [#Configuration as described]. During
    433 development, [http://activemq.apache.org/ ActiveMQ 5.2.0] has been used, 
     432development, [http://activemq.apache.org/ ActiveMQ 5.2.0] has been used,
    434433but [http://stomp.codehaus.org/Brokers any other] should be fine as well.
    435434
     
    448447solution to interact with a set of independent and loosely coupled machines
    449448from a single entry point (the host agent). In our case, this translates to
    450 virtual machines running under a given hypervisor, but it could very well be 
     449virtual machines running under a given hypervisor, but it could very well be
    451450a more traditional distributed computing setup, such as a cluster of machines
    452451that could take advantage of the "chatroom" nature of the implemented
    453 mechanism. 
     452mechanism.
    454453While some of the features this infrastructure offers could be regarded as
    455454already covered by the hypervisor API (as in the !VmWare's VIX API for command
    456455execution), the flexibility and granularity we attain is far greater: by means
    457456of the "words" of the implemented STOMP based protocol, we have ultimate
    458 access to the VMs, to the extend allowed by the Python runtime. 
     457access to the VMs, to the extend allowed by the Python runtime.
    459458
    460459
     
    464463    completely operate with the wrapped VM-based computations.
    465464  * Possibly implement more specialized operations, such as resource usage
    466     querying on-the-fly while the process is still running.
    467 
    468 
    469 
     465    querying on-the-fly while the process is still running.
     466
     467
     468
     469