THE HISTORY OF THE "BOINC_VM PROJECT" IN SELECTED EMAILS SINCE SEPTEMBER 2008 (B. Segal): From: David Anderson Date: 15 September 2008 18:26:41 GMT+02:00 To: Ben Segal Cc: Bruce Allen , Kevin Reed , Reinhard Prix , Rom Walton , Daniel Lombraña González , David Weir , Predrag Buncic Subject: Re: BOINC client strategies for virtualized jobs Ben: BOINC VM apps will be important to lots of people, and we definitely need to coordinate efforts. I suggest that we use the following Wiki page for the evolving design: http://boinc.berkeley.edu/trac/wiki/VmApps If you send me a doc in another format I can put it there. I can devote resources to working on this over the next year, i.e. BOINC client changes and/or the VM wrapper. -- David From: Daniel Lombraña González Date: 9 January 2009 12:30:41 GMT+01:00 To: David James Weir Cc: Predrag Buncic , Ben Segal Subject: Re: [Fwd: Re: A BOINC-VM question] Hi David, Predag and Ben, Thanks for sending me your suggestions about BOINC and virtualization. Your comments are really interesting, all of them seem to point into the right direction. However I have some minor suggestions or comments. The use of VMware Server is the best solution, but volunteers could be afraid of installing that software (remember that you have to register yourself in vmware.com in order to obtain a free serial number). So I don't know if the solution of VMware Server is good enought for all. Therefore, we should bear in mind always both approaches: institutions like Cern or Universities and Volunteers. The rest of your implementation sounds really good and I would like to test it in our infrastructure, obviously if you are interested. If you have a public repository where I can check out the source code it will be interesting to start some minor tests with your code. At the same time we are going to try also the HTTP solution for getting the output files from the VM. That's all, thanks for your comments and happy new year ;) Daniel ****************** On Sun, Jan 4, 2009 at 3:09 PM, David James Weir wrote: Hi Daniel (and Predrag), Ben suggested I forward to you these notes I made about BOINC with Virtual Machines. It incorporates his thinking with my experience trying to get the various bits to work together. I'm at the stage where I will be trying out the approach I give below in shortly (as soon as I have some spare time!). I think the major hurdles are now overcome; the big question is whether the approach I outline below will be useful to anyone! Kind regards, David Weir ****************** On 2 janv. 09, at 21:38, David James Weir wrote: I've had a couple more days to work on the BOINC-VM code; with  VMWare server 1 (as discussed earlier) I've got BOINC, VMWare and a  wrapper talking to each other, though I haven't (yet) tried running  it as a workunit because of issues with 64-bit Linux on my work  desktop. Daniel's idea is very good, and best of all doesn't assume too much  about the virtualisation method. We can probably do better with  VMWare Server, though: we can use VixVM API calls like  FileExistsInGuest() and CopyFileFromGuestToHost(), eliminating the  HTTP server altogether! Furthermore, this might be able to  eliminate the dependence on any form of network connection between  guest and host -- simplifying the configuration of VMWare Server. I think, though, that the exact way in which we do this is strongly  dependent on the "target audience" -- is it still the porting of  applications? I'm aware that what follows is essentially a rewrite  of the implementation you suggested to me a month or two back, but  as I have become familiar with the capabilities of the VIX API I've  been able to flesh out some of the details. Assumptions: no "inner BOINC client", inner application does not  link against BOINC libraries, inner application is small with very  few dependencies not included in a basic virtual machine. No  assumptions are made about whether the inner application would need  access to the network, but the wrapper XML standard could be  extended to include a field indicating whether this is necessary. 1. We deploy a virtual machine image over BOINC, with the VMWare  Tools installed. This need only be done once for our hypothetical  project, with different programs being sent as part of the workunit. 2. The VMWare-aware wrapper powers up the virtual machine, checks  for a snapshot relevant to the current workunit with  VixVM_GetNamedSnapshot() and VixVM_RevertToSnapshot() (this is our  checkpointing recovery step). If so, we skip to stage 4. 3. For a given workunit, the wrapper XML standard is then used to  send a package containing a non-BOINCified program and dependencies  (suppose a RPM for the CernVM). This is installed using a call to  RunProgramInGuest() by the VMWare-aware wrapper. This is then also  used to start the program running. 4 (main loop). The wrapper polls the process handle returned by  RunProgramInGuest() to see if the workunit has finished. It also  calls boinc_time_to_checkpoint(); then runs VixVM_CreateSnapshot()  to create a snapshot as a checkpoint, if necessary. 5. Upon successful completion of the inner job, we must copy out  the results, uninstall the workunit package in the VM and move the  checkpointed snapshot. The results are then sent back to the BOINC  server. We can get the CPU time easily inside the virtual machine  by calling the executable through time(1), say, but it may turn out  to be more reasonable to use the cputime estimate made by the core  client itself. Note: For issuing partial credit on long processes we could require  the inner program to update a file inside the VM (say /tmp/ creditreport) giving the number of floating point and integer  operations done so far. This could then be polled in step 4 (the  main loop), and the results returned to the core client. The same  thing would be necessary for reporting the fraction done. We could  even re-implement fraction_done() and ops_cumulative(), writing a  "fake" BOINC library to be used when compiling projects to run  inside virtual machines. ****************** From: Kevin Reed Date: 20 January 2009 20:34:29 GMT+01:00 To: Ben Segal Cc: Bruce Allen , David Anderson , David Weir , Derrick Kondo , Francois Grey , Predrag Buncic , Reinhard Prix , Rom Walton , Daniel Lombraña González Subject: Re: BOINC client strategies for virtualized jobs Ben et all, For your reference, I am attaching what I produced for my operating system course in regards to looking at virtual machines and using them with BOINC. The premise of the paper and study is a little flaky, but the work using the VIX API was decent. There are some rough thoughts about how BOINC could use this but there are a lot of issues out there. (See attached file: project_reed21.pdf) ***** NOTE FROM BEN: THIS IS NOW AVAILABLE ON THE BOINC WIKI ************ Set up details and code: http://home.comcast.net/~kevin_reed/ worth a little look. It still needs some maturity in my mind (like VMware server had processes that would stop running periodically). Kevin Reed . . . . . . . . . . . . . . . . . . . . . . . . . i b m i n t e r a c t i v e 71 S. Wacker Dr Chicago, IL, 60606-4637 312 529 2802 office knreed@us.ibm.com email unknown.gif ¨Ben Segal ---01/20/2009 01:29:46 PM---Dear Dave (and BOINC colleagues), 1__#$!@%!#__unknown.gif ¨ From: 2__#$!@%!#__unknown.gif ¨ Ben Segal 3__#$!@%!#__unknown.gif ¨ To: 4__#$!@%!#__unknown.gif ¨ David Anderson 5__#$!@%!#__unknown.gif ¨ Cc: 6__#$!@%!#__unknown.gif ¨ Bruce Allen , Kevin Reed/Chicago/IBM@IBMUS, Reinhard Prix , Rom Walton , Daniel Lombraña González , David Weir , Predrag Buncic , Francois Grey , Derrick Kondo 7__#$!@%!#__unknown.gif ¨ Date: 8__#$!@%!#__unknown.gif ¨ 01/20/2009 01:29 PM 9__#$!@%!#__unknown.gif ¨ Subject: 10__#$!@%!#__unknown.gif ¨ Re: BOINC client strategies for virtualized jobs Dear Dave (and BOINC colleagues), It's been a while (4 months) since our last exchange on this topic   but now I can happily report some progress. David Weir has found some   spare time from his PhD studies and is building a test implementation   of a "VM wrapper" along the lines we proposed. He has just put up   initial documentation on the Wiki page you created for us: http://boinc.berkeley.edu/trac/wiki/VmApps Daniel (in Spain) and we at CERN are looking forward to testing   David's code when it's ready. Any comments from any of you will be welcome! Best wishes to all and Happy New Year, Ben ****************** From: David James Weir Date: 20 January 2009 20:38:10 GMT+01:00 To: Kevin Reed Cc: Ben Segal , Bruce Allen , David Anderson , Derrick Kondo , Francois Grey , Predrag Buncic , Reinhard Prix , Rom Walton , Daniel Lombraña González Subject: Re: BOINC client strategies for virtualized jobs Hi Kevin, Kevin Reed wrote: Set up details and code: http://home.comcast.net/~kevin_reed/ It seems to me you've already done almost exactly what I was proposing! (I've just read your followup email... we have!) David ****************** From: Daniel Lombraña González Date: 21 January 2009 09:29:48 GMT+01:00 To: Kevin Reed Cc: Ben Segal , Bruce Allen , David Anderson , David Weir , Derrick Kondo , Francois Grey , Predrag Buncic , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs Hi all, At the end we have two similar solutions, that's good :). I have been reading Kevin's paper. The proposal for integrating BOINC and VMware sounds really good. For me there is only an small issue with using VMware Server and volunteers. Probably, some users will not join us due to the fact that they have to register themselves on vmware.com in order to download the vmware server 2.0 (correct me if I'm wrong). Besides this, the VMware solution is by far, the best solution for institutions like universities or companies, as for them it will be very easy to deploy such configuration. As Ben said before, we would like to try this solution with some problems that we have already tested with our previous approach of Vmware Player and BOINC. So, as soon we have tried it we will comment out the results. Regards, Daniel ****************** From: Daniel Lombraña González Date: 21 January 2009 09:34:00 GMT+01:00 To: David James Weir Cc: Ben Segal , Predrag Buncic , Francois Grey Subject: Re: BOINC client strategies for virtualized jobs Hi, We have now a new point for working on. The solution proposed by Kevin seems really good as it was yours. At the end, both of them are pretty similar, so I think that we can collaborate in order to improve the code in one direction. Right now, Kevin's solution only works on windows machines (the most used OS), however it will be interesting to add support for Linux and Mac. Thus, it will be interesting to add this work to Kevin's code. Additionally, it will be possible to start discussing if it will be interesting to modify BOINC for supporting VMware, or we prefer to modify only the wrapper job.xml file to support VMs. I think this is a really good discussion, so what do you think? Regards, Daniel ****************** On Tue, Jan 20, 2009 at 9:30 PM, David James Weir wrote: Hi Ben, Ben Segal wrote: What a surprise! I hope you're not too ticked off about this. Not at all bothered! He also has his own ideas about how to integrate this with BOINC. On the positive side it nicely confirms our approach (and shows that VMware Server 2 is OK to use). But it wasn't very comradely of Kevin to have sent no information on his work out to the list or to the Wiki. But it's a very nice job and also a fine discussion of many of the issues, including those around BOINC configuration and management. Since we've been caught on the back foot, I'm attaching what I've done so far to make a working VMWare wrapper, and I've put the necessary configuration examples below. It's more or less exactly what Kevin's done. The wrapper should compile with the standard wrapper makefile, adding -lvmware-vix if any of you are interested. There's much left to be done in my code: error checking, checkpointing, and correct BOINC handling of output files -- the code itself represents an afternoon of thinking and an afternoon of work, more or less. Note that I reindented the code (by mistake!) so running diff against the original code may not give helpful output! David ---- (inner.sh) #!/bin/sh # cd /tmp cat foo.dat > bar.dat (job.xml) cernvm-1.01-x86/cernvm-1.01-x86 djw03 guestpass /tmp inner.sh 0 0 foo.dat bar.dat (sample run) vmwrapper: starting Connected to server. Processing task 0. Task 1. ======== VM file root is cernvm-1.01-x86/cernvm-1.01-x86 Registered virtual machine with the server. Got handle. VM is now on (if it wasn't already). Logged in. Preparing job... Copying in input... Processing: foo.dat Running main program: done. Copying out output... Processing: bar.dat Sanitising virtual machine... Removing: foo.dat Removing executable. called boinc_finish ******* NOTE FROM BS: DAVID'S CODE IS AVAILABLE AT: http://www.cern.ch/ben/DW_Wrapper_VM.rtf **************** ****************** From: David Anderson Date: 2 March 2009 20:15:17 GMT+01:00 To: Kevin Reed Cc: Daniel Lombraña González , Ben Segal , David Weir , Derrick Kondo , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs I'd like to move forward with BOINC support for VM apps. In particular, I'd like to add code to the client to detect the presence/version of VM executives and report them to the server. Was a concensus reached on which VM executive to use for BOINC VM apps? (i.e., one with an API for suspend/resume and for moving files in and out) Recent postings suggest that the leading candidate is VMWare Server 2, with the VIX API. This has the drawback that its installation process (which volunteers will have to perform) is targeted at IT professionals, and may intimidate average volunteers. -- David ****************** From: Ben Segal Date: 2 March 2009 20:37:36 GMT+01:00 To: David Anderson Cc: Kevin Reed , Daniel Lombraña González , David Weir , Derrick Kondo , Reinhard Prix , Rom Walton , Predrag Buncic Subject: Re: BOINC client strategies for virtualized jobs Hi David, We'd also like to move forward with this but have no programming effort at the moment. Yes, the consensus is what you say: VMware Server 2 and VIX. For your information, a copy of David Weir's Linux code, based on his "Proposal" in the BOINC Wiki, is at: http://www.cern.ch/ben/DW_Wrapper_VM.rtf This nicely complements Kevin's Windows code (developed independently!). All the best, Ben ****************** From: David James Weir Date: 2 March 2009 21:00:54 GMT+01:00 To: Ben Segal Cc: David Anderson , Kevin Reed , Daniel Lombraña González , Derrick Kondo , Reinhard Prix , Rom Walton , Predrag Buncic Subject: Re: BOINC client strategies for virtualized jobs Hi all, I should warn you that the above code was just a couple of afternoons of playing around, and I gave up when I realised I was reinventing someone else's wheel! If we have some forward momentum, I'm happy to lend a hand with whatever eventually happens -- in between doing my PhD. Cheers, David ****************** From: Daniel Lombraña González Date: 3 March 2009 08:32:33 GMT+01:00 To: David James Weir Cc: Ben Segal , David Anderson , Kevin Reed , Derrick Kondo , Reinhard Prix , Rom Walton , Predrag Buncic Subject: Re: BOINC client strategies for virtualized jobs Hi all, Adding code to detect which virtual hypervisor is using the PC will be an interesting step on moving forward. The detection is the first step on integrating VMs and BOINC. As Ben has said: VMware Server is the best solution just right now, but as you have pointed out it will a bit difficult for average volunteers. The main problem, from my point of view, is that you need to register yourself on vmware.com, and I think that some volunteers will not like to register on VMware. Daniel ****************** From: Kevin Reed Date: 3 March 2009 16:52:26 GMT+01:00 To: Daniel Lombraña González , Ben Segal , David Anderson , David James Weir , Derrick Kondo , Predrag Buncic , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs David, As you look at implementing this I would like to suggest a different approach. I propose that the BOINC client views a particular host as having a set of 'Resources' and a set of 'Capabilities'. A 'Resource' are things like Ram, Processing Cores, Disk Space, GPU capabilities. Resources are fundamental to the client's task scheduling and management logic. A 'Capability' on the other hand, is something that the client has that may or may not allow it to run certain types of jobs. Examples of capabilities are the version of java it can run, the version of perl installed or a particular virtual machine being installed on the client. I think that the BOINC client should have embedded logic that allows it to detect Resources. However, it should also be designed so that each project can provide a small application that will detect the 'Capabilities' of the machine. This app would have a certain API that would report to the core client the set of capabilities that the host has and these capabilities would be reported back to the project on each scheduler request. I would expect that BOINC would provide a sample app that detects some common things and then projects can extended it as needed. For each additional capability there will likely be a 'wrapper' application developed that would support that capability. Hopefully projects would contribute their detection logic and wrapper applications back to BOINC and a very nice library could evolve over time. The reason that this is attractive is that the number of capabilities that projects may wish to use will change frequently. Adding each possibility to the core client will likely also result in a lot of bloat in the client over time as well as the client will be constantly behind in the features that projects are looking to use. So if we took this approach the changes that would be needed are: Modify the client to look for the test application if available from a project (needs to handle the multi-platform logic) If one is available, run it after it is downloaded and start reporting the capabilities (which would use the existing server logic to match applications with clients (i.e. like for GPU)) Create a example version of this sample application that can detect VMWare Server 2.0 Create a wrapper application that can run the VMWare Server 2.0 Once this is done,if projects want to go a different route and use different capabilities, all they need to do is add the logic for items #3 and #4. For example, Carl over at Quake Catcher network would implement 3 and 4 to see if the client has the appropriate device for detection and then the wrapper to monitor it. Just my 2 cents Kevin Reed . . . . . . . . . . . . . . . . . . . . . . . . . i b m i n t e r a c t i v e 71 S. Wacker Dr Chicago, IL, 60606-4637 312 529 2802 office knreed@us.ibm.com email ****************** From: David Anderson Date: 3 March 2009 17:19:27 GMT+01:00 To: Kevin Reed Cc: Daniel Lombraña González , Ben Segal , David James Weir , Derrick Kondo , Predrag Buncic , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs I think this is the right approach. The key distinction is that a "resource" is something for which there can be contention, so that the client needs to know about it and do some kind of scheduling (e.g., allow only one GPU job to run at a time) whereas a "capability" has no contention. The changes for the "test application" approach are all server-side and should be fairly minor; I'll work on it. -------- I'll contact VMWare and ask for their cooperation, e.g. to get a special version of VMWare Server 2 that is a 1-click install and is smaller than 560 MB. Of course, it would be better to use an open-source VM system; if anyone hears of interesting development let us know. -- David ****************** From: Daniel Lombraña González Date: 3 March 2009 17:34:23 GMT+01:00 To: David Anderson Cc: Kevin Reed , Ben Segal , David James Weir , Derrick Kondo , Predrag Buncic , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs I agree with David and Kevin, detecting "capabilities" will be the key step for running within clients Python applications, R scripts or whatever is needed. An interesting feature could be to allow BOINC clients to auto-install needed packages in clients by accepting a check-box or something similar that warn users about needed 3rd party software. For example, imagine that a project needs Python installed on the clients: 1.- The client detect if the Python interpreter is installed (as Kevin has explained) 2.- If it is not installed, ask the user if BOINC can automatically install it from a software repository. The interesting approach of this behavior is that, BOINC can install (with the user permissions) dependencies for running experiments, opening BOINC to more researchers. For security reasons, it will be interesting to support a bunch of applications that are digitally signed by BOINC or something similar, to avoid security failures or malware. I don't know if it will be better to have a central repository or the BOINC server's project has to act as a software repository for their clients. This last solutions sounds really interesting, as researches can define which are the "capabilities" requirements for running their experiments. Daniel ****************** From: David Anderson Date: 3 March 2009 20:15:51 GMT+01:00 To: Ben Segal Cc: Kevin Reed , Daniel Lombraña González , David Weir , Derrick Kondo , Predrag Buncic , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs I'll add a "test application" mechanism to allow projects to probe host capabilities. Let's postpone thinking about how to automatically install capabilities; let's assume that the user has to do it manually, and we can notify them via messages in the BOINC manager if they need to do so. The next steps are to develop the app to probe VM presence/version, and to develop the wrapper (which seems to be largely done). However, to finish either of these we need to nail down the choice of VM system. VMWare Server 2 is the only one we've identified that meets our needs, but its installation process is too involved (and the 560MB download is probably too big) for volunteers. It's conceivable that VMWare could provide us with a "VMWare Lite" tailored to the needs of BOINC. I know Mendel Rosenblum, who developed VMWare but has since left the company. I just sent him an email asking him for contacts in VMWare, and asking him if he knows of viable alternatives. If anyone else knows of VMWare alternatives, please let us know. -- David ****************** From: Predrag Buncic Date: 3 March 2009 22:31:38 GMT+01:00 To: David Anderson Cc: Ben Segal , Kevin Reed , Daniel Lombraña González , David Weir , Derrick Kondo , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs Perhaps Sun's VirtualBox (http://www.virtualbox.org/) could qualify as an alternative to VMware - its is cross platform (Windows, Linux, MacOS, Solaris) and Open Source Edition would probably be good enough for what we want to do. It has a command line as well as Web service API. The download size is 30-36 MB (depending on a platform) and memory footprint is ~17 MB. As a bonus, it can use the same image format as VMware (without VMware tools installed). The drawback is that it is still a bit more complicated to install and configure than VMware. Regarding installation of software packages, we can handle that using 'thin virtual machine' approach (like we do in CernVM) where all software is made available to VM just in time by means of a special file system that takes care of efficient downloading, caching and possibly sharing software components... Predrag ****************** From: Daniel Lombraña González Date: 4 March 2009 10:26:40 GMT+01:00 To: Predrag Buncic Cc: David Anderson , Ben Segal , Kevin Reed , David Weir , Derrick Kondo , Reinhard Prix , Rom Walton Subject: Re: BOINC client strategies for virtualized jobs I have been re-reading the papers that Kevin send us. The technical report: "Using Virtual Machines in Desktop Grid Clients for Application Sandboxing" from the CoreGrid team proposes to use Qemu+KQEMU. After re-reading it, it seems a good option to use qEmu+KQEMU. One of the main benefits is that run on Windows and GNU/Linux, it is small (6 to 9 MB depending on the platform), and it seems that it is possible to access and control it via a socket. Additionally it is GPL, so it is possible to integrate the code with BOINC. It has some drawbacks like it is slower than VMware, and setting up an ethernet connection could be more difficult to the user than installing VMware Software. When I started with Qemu+KQEMU and VMware at CERN, the KQEMU accelerator was not GPL. Thus, we didn't use it because: 1) you have to install 3 packages: Qemu, KQEMU, and OpenVPN in Windows machines for having ethernet connection; 2) it wasn't GPL and 3) the set up of the ethernet connection is more complex than in VMware. I think that we can continue with VMware server, but on the other hand we should re-check Qemu and see if it is a good solution for the future. What do you think? PS: Adjoining is the technical report. Daniel ******************