Changes between Version 6 and Version 7 of CreditNew

Show
Ignore:
Author:
davea (IP: 128.32.18.181)
Timestamp:
11/03/09 14:37:20 (3 weeks ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditNew

    v6 v7  
    1616is the ratio of actual FLOPS to peak FLOPS. 
    1717 
    18 GPUs typically have a much higher (50-100X) peak FLOPS than GPUs. 
     18GPUs typically have a much higher (50-100X) peak FLOPS than CPUs. 
    1919However, application efficiency is typically lower 
    2020(very roughly, 10% for GPUs, 50% for CPUs). 
    156156   It's not exactly "Actual FLOPs", since the most efficient 
    157157   version may not be 100% efficient. 
     158 * There are two sources of variance in PFC(V): 
     159   the variation in host efficiency, 
     160   and possibly the variation in job size. 
     161   If we have an ''a priori'' estimate of job size 
     162   (e.g., workunit.rsc_fpops_est) 
     163   we can normalize by this to reduce the variance, 
     164   and make PFC*(V) converge more quickly. 
     165 * ''a posteriori'' estimates of job size may exist also 
     166   (e.g., an iteration count reported by the app) 
     167   but using this for anything introduces a new cheating risk, 
     168   so it's probably better not to. 
     169 
    158170 
    159171== Cross-project normalization == 
    190202 
    191203Assuming that hosts are sent jobs for a given app uniformly, 
    192 then for a given app 
     204then, for that app, 
    193205hosts should get the same average granted credit per job. 
    194206To ensure this, for each application A we maintain the average VNPFC*(A), 
    200212 
    201213There are some cases where hosts are not sent jobs uniformly: 
    202  * job-size matching 
     214 * job-size matching (smaller jobs sent to slower hosts) 
    203215 * GPUGrid.net's scheme for sending some (presumably larger) 
    204216   jobs to GPUs with more processors. 
    205 In these cases we must scale 
     217In these cases average credit per job must differ between hosts, 
     218according to the types of jobs that are sent to them. 
     219 
     220This can be done by dividing 
     221each sample in the computation of VNPFC* by WU.rsc_fpops_est 
     222(in fact, there's no reason not to always do this). 
    206223 
    207224Notes: 
    208  * This mechanism reduces the claimed credit of hosts 
     225 * The host normalization mechanism reduces the claimed credit of hosts 
    209226   that are less efficient than average, 
    210227   and increases the claimed credit of hosts that are more efficient 
    269286}}} 
    270287 
    271 == Jobs versus app units == 
    272    To deal with this, we can weight jobs by workunit.rsc_flops_est. 
    273  
    274 If a project changes between jobs to app units, 
    275 it must reset 
    276  
    277288== Cross-project scaling factors == 
    278289 
    288299granted credit = claimed credit. 
    289300 
    290 For jobs that are replicated, granted credit is be 
     301For jobs that are replicated, granted credit should be 
    291302set to the min of the valid results 
    292303(min is used instead of average to remove the incentive 
    315326== Job runtime estimates == 
    316327 
     328Unrelated to the credit proposal, but in a similar spirit. 
     329The server will maintain ET*(H, V), the statistics of 
     330job runtimes (normalized by wu.rsc_fpops_est) per 
     331host and application version. 
     332 
     333The server's estimate of a job's runtime is then 
     334{{{ 
     335R(J, H) = wu.rsc_fpops_est * ET*(H, V) 
     336}}} 
     337 
    317338== Implementation == 
    318339 

If this page is incomplete or incorrect, please edit it or add it to the wiki to-do list. To do this, you must be logged in; click Login or Register above.