Changes between Version 17 and Version 18 of VirtualBox

Show
Ignore:
Author:
dgquintas (IP: 137.138.182.211)
Timestamp:
10/20/09 04:55:25 (1 month ago)
Comment:

Updated to latest version

Legend:

Unmodified
Added
Removed
Modified
  • VirtualBox

    v17 v18  
    55== "Logistic" advantages == 
    66 
    7  1. One order of magnitude lighter, both its installation package (~35 MB) and 
    8  its installed size (~60 MB). Compare with the 500+ MB of VMWare Server 2.0, 
     7 1. One order of magnitude lighter, both its installation package (~70 MB) and 
     8 its installed size (~80 MB). Compare with the 500+ MB of VMWare Server 2.0, 
    99 that increase in some 150 extra MB when installed. 
    1010 1. License. Its OSE (Open Source Edition) is published under the GPL v.2, but 
    1111 even the non-libre version -PUEL, 
    12  [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation 
    13  License]- could be used for our purposes, but that's something to be checked 
    14  by someone who actually knows something about licensing, unlike myself.  
     12 [http://www.virtualbox.org/wiki/VirtualBox_PUEL Personal Use and Evaluation License]- could be used for our purposes, but that's something to be checked 
     13 by someone who actually knows something about licensing, unlike myself. 
    1514 1. Faster and "less painful" installation process, partly due to its lighter 
    1615 weight. No license number required, hence less hassle for the user. 
    2019The interaction with the VM is made possible even from the command line, in 
    2120particular from the single command `VBoxManage` (extensive doc available in 
    22 [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual]). Of 
    23 particular interest for us are the following VBoxManager's arguments: 
     21[http://www.virtualbox.org/manual/UserManual.html the manual]). The following VBoxManager arguments are particularly interesting : 
    2422    - startvm 
    2523    - controlvm  pause|resume|reset|poweroff|savestate ... 
    2927    - registervm 
    3028 
    31 All the functionalities exposed by this command are also available throughout 
    32 a C++ COM/XPCOM based API, as well as Python bindings. However, the `VBoxManage`  
    33 is already ported to several platforms and it's flexible enough as to be relied on 
    34 to interact with !VirtualBox. 
     29All the functionality exposed by this command is also available through 
     30a C++ COM/XPCOM based API, Python bindings and SOAP based web services. 
    3531 
    3632Following the capabilities enumeration introduced by Kevin, !VirtualBox would 
    3733compare to his analysis based on VMWare Server as follows: 
    3834 
    39  1. Manage the Image.  Covered by the "`snapshot`" command  
    40  1. Boot the virtual machine. Covered by "`startvm`"  
     35 1. Manage the Image.  Covered by the "`snapshot`" command 
     36 1. Boot the virtual machine. Covered by "`startvm`" 
    4137 1. Copy files host -> guest: '''Not''' directly supported by the !VirtualBox API. 
    4238 We'd need to resource to external solutions 
    4339 such as the one detailed below based on [http://www.cs.wisc.edu/condor/chirp/ Chirp]. 
    4440 1. Run a program on the guest. Same as 3. 
    45  1. Pause and the guest. Covered by "`controlvm pause/resume`"  
     41 1. Pause and the guest. Covered by "`controlvm pause/resume`" 
    4642 1. Retrieve files from the guest.  See 3 and 4, same situation. 
    4743 1. Shutdown the guest Covered by "`controlvm poweroff`" 
    4945 
    5046== Bindings == 
    51 In case the direct usage of the `VBoxManage` command wouldn't be appropriate, 
    52 it's possible to fallback to the low-level API. 
    53 Both VMWare Server and !VirtualBox make available C/C++ APIs, as well as 
    54 Python, with different levels of support -in case of VMWare, it's an 
    55 unsupported project.  !VirtualBox's API is based on COM/XPCOM, and it's 
    56 possible to implement a unified windows/linux approach based on the former 
    57 technology. The actual code implementing the [http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage VBoxManage] 
    58 command is a very good reference. 
    59 Therefore, implementing a "hypervisor abstraction layer" is in principle 
    60 feasible, with a common win/linux codebase both for VIX and !VirtualBox API. 
     47Despite `VBoxManage` being an excellent debugging and testing tool, it's not enough for our purposes. We'll need access to some 
     48deeper structures not made available to such a high level tool. 
     49 
     50The question now comes to which of the available bindings to use. 
     51VirtualBox's API is ultimately based on COM/XPCOM. It'd be 
     52possible to implement a unified windows/linux approach based on these technologies, as demonstrated by the aforementioned 
     53[http://www.virtualbox.org/browser/trunk/src/VBox/Frontends/VBoxManage VBoxManage] command. On the other hand, this isn't a simple task, full 
     54of quirks and platform specific pitfalls (COM is used on Windows, whereas Linux and presumably MacOS X resource to XPCOM). 
     55 
     56The Python bindings sound promising. Unfortunately, they aren't distributed with most of the pre-built binaries available at the !VirtualBox webpage.  
     57 
     58We are left with the SOAP based web services. This is a sufficiently well known mechanism as to have proper support on the three supported systems. Moreover, the [http://dlc.sun.com/virtualbox/vboxsdkdownload.html VirtualBox SDK] includes a good deal of Python code tailored for interacting with it. 
     59This is the way the current implementation has gone. 
     60 
    6161 
    6262== Interacting with the VM Appliance == 
    8282== Introduction == 
    8383In previous sections, two limitations of the API offered by !VirtualBox 
    84 were pointed out. Namely, the inability to directly support the  
    85 execution of command and file copying between the host and the guest.  
     84were pointed out. Namely, the inability to directly support the 
     85execution of command and file copying between the host and the guest. 
    8686While relatively straightforward solutions exist, notably the usage of SSH, 
    8787they raise issues of their own: the guest needs to (properly) configure this 
    9090Thus, the requirements for a satisfactory solution would include: 
    9191 
    92   * Minimal or no configuration required on the guest side.  
    93   * No assumptions on the network reachability of the guest. Ideally,  
     92  * Minimal or no configuration required on the guest side. 
     93  * No assumptions on the network reachability of the guest. Ideally, 
    9494    guests should be isolated from "the outside world" as much as possible. 
    9595 
    9797 
    9898  * Scalability. The solution should account for the execution of an arbitrary 
    99     number of guests on a given host.  
     99    number of guests on a given host. 
    100100  * Technology agnostic: dependencies on any platform/programming 
    101101    language/hypervisor should be kept to a minimum or avoided altogether. 
    103103 
    104104== Proposed Solution == 
    105 Following Predrag Buncic's advice, I began looking into such a solution based on 
    106 asynchronous message passing. In order to keep the footprint, both on the host and the guest sides, 
    107 the [http://stomp.codehaus.org/Protocol STOMP protocol]  
    108 came to mind. The protocol is simple enough as to have implementations in a 
    109 large number of programming languages, while fulfilling all flexibility needs. Despite its  
     105A very promising solution based on asynchronous message passing was proposed by Predrag Buncic. 
     106The lightweight [http://stomp.codehaus.org/Protocol STOMP protocol] has been considered, in order  
     107to incur on a small footprint. This protocol is simple enough as to have implementations in a 
     108large number of programming languages, while still fulfilling all flexibility needs. Despite its 
    110109simplicity and being relatively unheard of, ActiveMQ supports it out-of-the-box (even though 
    111110it'd be advisable to use something lighter for a broker). 
    113112Focusing on the problem at hand, we need to tackle the following problems: 
    114113 
    115   * Command execution on the guest 
     114  * Command execution on the guest (+ resource usage accounting for proper crediting). 
    116115  * File transfer from the host to the guest 
    117116  * File transfer from the guest to the host 
    125124  host and the guests need to share some knowledge about the broker's location, if it's going 
    126125  to be running on an independent machine. Otherwise, it can be assumed that it listens on the 
    127   host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism  
    128   is put in place in the host in order to route the connections to the broker.  
     126  host's IP. Moreover, this can always be assumed if an appropriate port forwarding mechanism 
     127  is put in place in the host in order to route the connections to the broker. 
    129128  
    130   The recent release of the 2.2 series of !VirtualBox is a very convenient one: the newly introduced 
    131   host-only networking feature fits our needs like a glove. From  
    132   [http://download.virtualbox.org/virtualbox/2.2.2/UserManual.pdf the manual] (section 6.7): 
     129  The addition, in version 2.2, of the host-only networking feature was really convenient. From 
     130  [http://www.virtualbox.org/manual/UserManual.html#network_hostonly the relevant section] of the manual: 
    133131 
    134132    Host-only networking is another networking mode that was added with version 2.2 
    146144    virtual machines cannot be seen, the traffic on the “loopback” interface on the host 
    147145    can be intercepted. 
    148      
     146    
    149147  That is to say, we have our own virtual "ethernet network". On top of that, !VirtualBox 
    150148  provides an easily configurable DHCP server that makes it possible to set a fixed IP for the 
    158156  message passing infrastructure: a tailored message addressed to the guest we want to 
    159157  run the command on is published, processed by this guest and eventually answered back 
    160   with some sort of status (maybe even periodically in order to feedback about progress).  
     158  with some sort of status (maybe even periodically in order to feedback about progress). 
    161159 
    162160  Given the subscription-based nature of the system, several guests can be addressed at 
    163161  once by a single host, triggering the execution of commands (or any other action 
    164   covered by this mechanism) in a single go. Note that neither the hosts nor the  
     162  covered by this mechanism) in a single go. Note that neither the hosts nor the 
    165163  (arbitrary number of) guests need to know how many of the latter conform the system: 
    166164  new guest instances need only subscribe to these "broadcasted" messages on their own 
    170168  === File Transfers === 
    171169  This is a trickier feature: transfers must be bidirectional, yet we want to avoid any kind 
    172   of exposure or (complex) configuration.  
     170  of exposure or (complex) configuration. 
    173171 
    174172  The proposed solution takes advantage of the [http://www.cse.nd.edu/~ccl/software/chirp/ Chirp protocol and set of tools]. 
    175173  This way, we don't even require privileges to launch the server instances. Because 
    176174  the file sharing must remain private, the chirp server is run on the guests. The host agent 
    177   would act as a client that'd send or retrieve files. We spare ourselves from all the  
     175  would act as a client that'd send or retrieve files. We spare ourselves from all the 
    178176  gory details involved in the actual management of the transferences, delegating the job 
    179177  to chirp (which deals with it brilliantly, by the way). 
    180178 
    181   The only bit missing in this argumentation is that the host needs to be aware of the guests'  
    182   IP addresses in order to communicate with these chirp servers. This is a no-issue, as the  
     179  The only bit missing in this argumentation is that the host needs to be aware of the guests' 
     180  IP addresses in order to communicate with these chirp servers. This is a no-issue, as the 
    183181  custom STOMP-based protocol implemented makes it possible for the guests to "shout out" their 
    184182  details so that the host can keep track of every single one of them. 
    188186  * Where should the broker live? Conveniently on the same machine as the hypervisor or on 
    189187    a third host? Maybe even a centralized and widely known (ie, standard) one? This last option 
    190     might face congestion problems, though.  
    191   * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter?  
    192     (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this  
     188    might face congestion problems, though. 
     189  * Broker choice. Full-fledged ([http://activemq.apache.org/ ActiveMQ]) or more limited but lighter? 
     190    (ie, [http://www.germane-software.com/software/Java/Gozirra/ Gozirra]). On this 
    193191    question, unless a centralized broker is universally used, the lighter version largely suffices. 
    194192    Otherwise, given the high load expected, a more careful choice should be made. 
    201199changed: at least in the !VirtualBox case, no two disk images (globally) can 
    202200have the same UUID. Luckily this can be quickfixed, taking into account we 
    203 are looking for the following pattern:  
    204  
    205 {{{ 
    206 dgquintas@portaca:$ grep -n -a -m 1 "uuid.image" cernvm-1.2.0-x86.vmdk  
     201are looking for the following pattern: 
     202 
     203{{{ 
     204dgquintas@portaca:$ grep -n -a -m 1 "uuid.image" cernvm-1.2.0-x86.vmdk 
    20720520:ddb.uuid.image="ef98873f-7954-4ed8-919a-aae7fb7443a8" 
    208206}}} 
    210208Notice the -m 1 flag, to avoid going through the many megabytes the file is 
    211209worth. In place modifications of this UUID can be trivially performed in-place 
    212 by using, for instance, sed.  
     210by using, for instance, sed. 
    213211 
    214212 
    217215 
    218216  === Overview === 
    219   Upon initialization, guests connect to the broker, that's expected to listen on the  
    220   default STOMP port 61613 at the guest's gateway IP.  
     217  Upon initialization, guests connect to the broker, that's expected to listen on the 
     218  default STOMP port 61613 at the guest's gateway IP. 
    221219  Once connected, it "shouts out" he's joined the party, providing a its unique id (see 
    222220  following section for details). Upon reception, the BOINC host notes down this unique id for 
    223   further unicast communication (in principle, other guests don't need this information). The  
     221  further unicast communication (in principle, other guests don't need this information). The 
    224222  host acknowledges the new guest (using the STOMP-provided ack mechanisms). 
    225223 
    226224  Two channels are defined for the communication between host agent and VMs: the 
    227225  connection and the command channels (this conceptual "channels" are actually 
    228   a set of STOMP topics. Refer to [http://bitbucket.org/dgquintas/boincvm/src/tip/destinations.py the source]  
     226  a set of STOMP topics. Refer to [http://bitbucket.org/dgquintas/boincvm/src/tip/destinations.py the source] 
    229227  for their actual string definition). 
    230228 
    231229 
    232230  === Unique Identification of Guests === 
    233   The preferred way to identify guests is based simply on their IP. 
     231  The preferred way to identify guests is by their name, as assigned by the hypervisor. This presents a problem, as they VMs themselves are internally unaware of their own name. A "common ground" is needed in order to work around this problem.  
     232 
     233The MAC address of the host-only virtual network card will be the common piece of data, unique and known by both the VM and hypervisor/host system, that will enable us to establish an unequivocal mapping between the VM and "the outside world". This MAC address is of course unique in the virtual network, ensured by !VirtualBox. It's available to the OS inside the VM has access to (as part of the properties of the virtual network interface), as well as through the VirtualBox API, completing the circle. 
    234234 
    235235  === VM Aliveness === 
    242242  The whole custom made protocol syntax is encapsulated in the 
    243243  classes of the "words" package. Each of these words correspond 
    244   to this protocol's commands, which are always encoded as  
     244  to this protocol's commands, which are always encoded as 
    245245  the first single word of the exchanged STOMP messages. 
    246246 
    267267 
    268268{{{ 
    269 BODY:  
     269BODY: 
    270270  CMD_RUN 
    271271}}} 
    281281 
    282282{{{ 
    283 BODY:  
     283BODY: 
    284284  CMD_RESULTS <json-ed dict. of results> 
    285285}}} 
    286286 
    287       This word requires a bit more explanation.  
     287      This word requires a bit more explanation. 
    288288      Its body encodes the command execution results as 
    289       a dictionary with the following keys:  
    290  
    291 {{{ 
    292 results:  
    293   {  
     289      a dictionary with the following keys: 
     290 
     291{{{ 
     292results: 
     293  { 
    294294    'cmd-id': same as in the word headers 
    295295    'out': stdout of the command 
    340340== API Accesibility == 
    341341The host agent functionalities are made accesible through a XML-RPC 
    342 based API. This choice aims to provide a simple yet fully functional,  
     342based API. This choice aims to provide a simple yet fully functional, 
    343343standard and multiplatform mechanism of communication between this 
    344 agent and the outside world, namely the BOINC wrapper.  
     344agent and the outside world, namely the BOINC wrapper. 
    345345 
    346346 
    347347== Dependencies == 
    348348This section enumerates the external packages (ie, not included in the 
    349 standard python distribution) used. The version used during development  
     349standard python distribution) used. The version used during development 
    350350is given in parenthesis. 
    351351 
    352   * [http://pypi.python.org/pypi/netifaces/0.5 Netifaces] (0.5)  
     352  * [http://pypi.python.org/pypi/netifaces/0.5 Netifaces] (0.5) 
    353353  * [http://code.google.com/p/stomper/ Stomper] (0.2.2) 
    354   * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires  
     354  * [http://twistedmatrix.com/ Twisted] (8.2.0), which indirectly requires 
    355355     [http://www.zope.org/Products/ZopeInterface Zope Interfaces] (3.5.1) 
    356356  * [http://code.google.com/p/simplejson/ simplejson] (2.0.9). Note that this 
    361361== Miscelaneous Features == 
    362362  * Multiplatform: it runs wherever a python runtime is available. All 
    363   the described dependencies are likewise portable.  
     363  the described dependencies are likewise portable. 
    364364  * Fully asynchronous. Thanks to the usage of the Twisted framework, the 
    365   whole system developed is seamlessly multithreaded, even though no  
     365  whole system developed is seamlessly multithreaded, even though no 
    366366  threads are used (in the developed code at least). Instead, all the 
    367   operations rely on the asynchronous nature of the Twisted mechanism,  
    368   about which details are given  
     367  operations rely on the asynchronous nature of the Twisted mechanism, 
     368  about which details are given 
    369369  [http://twistedmatrix.com/projects/core/documentation/howto/async.html here]. 
    370370 
    372372Because action speak louder than words, a prototype illustrating the previous 
    373373points has been developed. Bear in mind that, while functional, this is a 
    374 proof of concept and surely can be much improved.  
     374proof of concept and surely can be much improved. 
    375375 
    376376=== Structure === 
    377377[[Image(classDiagram.png)]] 
    378378In the previous class diagram special attention should be paid to the classes 
    379 of the "words" package: they encompass the logic of the implemented protocol.  
    380 The `Host` and `VM` classes model the host agent and the VMs, respectively.  
     379of the "words" package: they encompass the logic of the implemented protocol. 
     380The `Host` and `VM` classes model the host agent and the VMs, respectively. 
    381381Classes with a yellow background are support the underlying STOMP 
    382 architecture.  
     382architecture. 
    383383`CmdExecuter` deals with the bookkeeping involved in the execution of 
    384384commands. `MsgInterpreter` takes care of routing the messages received by 
    391391Several aspects can be configured, on three fronts: 
    392392 
    393 * Broker:  
     393* Broker: 
    394394  * `host`: the host where the broker's running 
    395395  * `port`: port the broker's listening on 
    397397  * `password`: broker auth. 
    398398 
    399 * Host:  
     399* Host: 
    400400  * `chirp_path`: absolute path (including /bin) of the chirp tools 
    401401  * `xmlrpc_listen_on`: on which interface to listen for XML-RPC requests. 
    407407 
    408408  The configuration file follows 
    409   [http://docs.python.org/library/configparser.html Python's !ConfigParser] syntax, and its latest 
    410   version can be found  
     409  [http://docs.python.org/library/configparser.html Python's ConfigParser] syntax, and its latest 
     410  version can be found 
    411411  [http://bitbucket.org/dgquintas/boincvm/src/tip/config.cfg here]. 
    412412 
    413413=== Download and Usage === 
    414 The current source code can be browsed as a  
     414The current source code can be browsed as a 
    415415[http://bitbucket.org/dgquintas/boincvm/ mercurial repository], or downloaded from that same webpage. 
    416 In addition, the packages described in [#Dependencies the dependencies 
    417 section] must be installed as well.  
     416In addition, the packages described in [#Dependencies the dependencies section] must be installed as well. 
    418417 
    419418Starting up the host agent amounts to: 
    420419 
    421420{{{ 
    422   dgquintas@portaca:~/.../$ python HostMain.py config.cfg  
     421  dgquintas@portaca:~/.../$ python HostMain.py config.cfg 
    423422}}} 
    424423 
    431430Of course, a broker must be running on the host and port defined in the 
    432431configuration file being used, [#Configuration as described]. During 
    433 development, [http://activemq.apache.org/ ActiveMQ 5.2.0] has been used,  
     432development, [http://activemq.apache.org/ ActiveMQ 5.2.0] has been used, 
    434433but [http://stomp.codehaus.org/Brokers any other] should be fine as well. 
    435434 
    448447solution to interact with a set of independent and loosely coupled machines 
    449448from a single entry point (the host agent). In our case, this translates to 
    450 virtual machines running under a given hypervisor, but it could very well be  
     449virtual machines running under a given hypervisor, but it could very well be 
    451450a more traditional distributed computing setup, such as a cluster of machines 
    452451that could take advantage of the "chatroom" nature of the implemented 
    453 mechanism.  
     452mechanism. 
    454453While some of the features this infrastructure offers could be regarded as 
    455454already covered by the hypervisor API (as in the !VmWare's VIX API for command 
    456455execution), the flexibility and granularity we attain is far greater: by means 
    457456of the "words" of the implemented STOMP based protocol, we have ultimate 
    458 access to the VMs, to the extend allowed by the Python runtime.  
     457access to the VMs, to the extend allowed by the Python runtime. 
    459458 
    460459 
    464463    completely operate with the wrapped VM-based computations. 
    465464  * Possibly implement more specialized operations, such as resource usage 
    466     querying on-the-fly while the process is still running.  
    467  
    468  
    469  
     465    querying on-the-fly while the process is still running. 
     466 
     467 
     468 
     469 

If this page is incomplete or incorrect, please edit it or add it to the wiki to-do list. To do this, you must be logged in; click Login or Register above.