After Hardware upgrade, all GPU task fail with Computation error

Message boards : Questions and problems : After Hardware upgrade, all GPU task fail with Computation error
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile CeSinge

Send message
Joined: 13 May 10
Posts: 14
Belgium
Message 75850 - Posted: 11 Feb 2017, 12:21:40 UTC

Hello,

I upgraded my hardware days ago to 2 Gigabyte GTX1070 GPU (on Windows 7 x64). Boinc still seems to run normally, detects the 2 GPU, but all GPU related tasks start, and end after about 20s seconds with a computation error (even though Boinc Manager shows 100% progress).

This is the case for all Einstein, MilkyWay, Asteroids, ClimateChange, Cosmology @Home tasks.
LHC@Home, not using the GPU works fine. Another machine on my account still works perfectly (a Macbook Pro) also.

In fact, the event log shows them as 'Starting task...' but never reports that they ended. Beyond that, I do not even see how to debug this as the event log doesn't seem to provide a lot of info to go further.
Upgraded to the latest Boinc 7.6.33 x64 didn't change anything. Boinc not installed as a service.

Any suggestions? Thank you.
ID: 75850 · Report as offensive
Profile CeSinge

Send message
Joined: 13 May 10
Posts: 14
Belgium
Message 75851 - Posted: 11 Feb 2017, 12:39:51 UTC - in response to Message 75850.  

Adding a few details here:
My boinc Data directory is on my NAS, mapped as J:\Boinc\Data. This is however not new, and again, works for non-GPU related tasks.
Enabling the 'tasks' debug in the event log shows me more, in particular that such tasks seem to end on an exit code 2 (0x2): The system cannot find the file specified.
It doesn't help much: why would there be a file not found, what file is the task trying to find?
Also, I've reset all tasks, but that doesn't change anything.

Here is an extract:
11/2/17 13:32:04 | Milkyway@Home | [task] Process for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9167815_0 exited, exit code 2, task state 1
11/2/17 13:32:04 | Milkyway@Home | [task] task_state=EXITED for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9167815_0 from handle_exited_app
11/2/17 13:32:04 | Milkyway@Home | [task] result state=COMPUTE_ERROR for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9167815_0 from CS::report_result_error
11/2/17 13:32:04 | Milkyway@Home | [task] Process for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9167815_0 exited
11/2/17 13:32:04 | Milkyway@Home | [task] exit code 2 (0x2): The system cannot find the file specified. (0x2)
11/2/17 13:32:04 | Milkyway@Home | Computation for task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9167815_0 finished
11/2/17 13:32:04 | Milkyway@Home | [task] result state=COMPUTE_ERROR for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9167815_0 from CS::app_finished
11/2/17 13:32:05 | Milkyway@Home | [task] task_state=EXECUTING for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2 from start
11/2/17 13:32:05 | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2
11/2/17 13:32:05 | climateprediction.net | [task] result hadcm3s_7503_200012_168_514_010913140_1 checkpointed
11/2/17 13:32:13 | Milkyway@Home | [task] Process for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9132470_1 exited, exit code 2, task state 1
11/2/17 13:32:13 | Milkyway@Home | [task] task_state=EXITED for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9132470_1 from handle_exited_app
11/2/17 13:32:13 | Milkyway@Home | [task] result state=COMPUTE_ERROR for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9132470_1 from CS::report_result_error
11/2/17 13:32:13 | Milkyway@Home | [task] Process for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9132470_1 exited
11/2/17 13:32:13 | Milkyway@Home | [task] exit code 2 (0x2): The system cannot find the file specified. (0x2)
11/2/17 13:32:14 | Milkyway@Home | Computation for task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9132470_1 finished
11/2/17 13:32:14 | Milkyway@Home | [task] result state=COMPUTE_ERROR for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9132470_1 from CS::app_finished
11/2/17 13:32:14 | Milkyway@Home | [task] task_state=EXECUTING for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9167775_0 from start
11/2/17 13:32:14 | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_1_1484858101_9167775_0
11/2/17 13:32:17 | Milkyway@Home | [task] Process for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2 exited, exit code 2, task state 1
11/2/17 13:32:17 | Milkyway@Home | [task] task_state=EXITED for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2 from handle_exited_app
11/2/17 13:32:17 | Milkyway@Home | [task] result state=COMPUTE_ERROR for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2 from CS::report_result_error
11/2/17 13:32:17 | Milkyway@Home | [task] Process for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2 exited
11/2/17 13:32:17 | Milkyway@Home | [task] exit code 2 (0x2): The system cannot find the file specified. (0x2)
11/2/17 13:32:17 | Milkyway@Home | Computation for task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2 finished
11/2/17 13:32:17 | Milkyway@Home | [task] result state=COMPUTE_ERROR for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1484858101_9036076_2 from CS::app_finished
11/2/17 13:32:18 | Milkyway@Home | [task] task_state=EXECUTING for de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9145577_1 from start
11/2/17 13:32:18 | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_2_1484858101_9145577_1
ID: 75851 · Report as offensive
ChristianB
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 321
Germany
Message 75856 - Posted: 11 Feb 2017, 14:17:14 UTC
Last modified: 11 Feb 2017, 14:17:22 UTC

You have more luck looking at the application logfiles that are uploaded to the project server. On Einstein for example you can see this:
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "GeForce GTX 1070" by: NVIDIA Corporation
Max allocation limit: 2147483648
Global mem size: 0
Couldn't create OpenCL context (error: 2)!
initialize_ocl returned error [2007]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
01:50:33 (8664): [CRITICAL]: ERROR: MAIN() returned with error '5'


Which means something is wrong with your driver. I've seen computers with 378.57 that don't produce errors.
ID: 75856 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 75859 - Posted: 11 Feb 2017, 14:25:58 UTC - in response to Message 75850.  

Did you reinstall GPU drivers after upgrading hardware?

You seem to be having problems with CPDN and Cosmology CPU tasks as well. You'd better ask about those in their forums.
ID: 75859 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 75865 - Posted: 11 Feb 2017, 15:28:42 UTC

ID: 75865 · Report as offensive
Juha Kauppi

Send message
Joined: 12 Feb 17
Posts: 1
Finland
Message 75872 - Posted: 12 Feb 2017, 6:41:36 UTC - in response to Message 75850.  

I had similar problems and the cause was latest nvidia drivers 378.49 rolled back to 376.33 and everything was working again, hope this helps.
ID: 75872 · Report as offensive
Profile CeSinge

Send message
Joined: 13 May 10
Posts: 14
Belgium
Message 75880 - Posted: 12 Feb 2017, 20:42:09 UTC - in response to Message 75872.  

Hello, thanks to all for your input. I was myself ill and out of service for 24h, explains my lack of responsiveness.

I indeed did upgrade to the latest NVidia drivers, 378.49. So the most easy thing to do was what Juha suggested: rollback to 376.33, and that was enough to make it run as a charm. Two or three GPU jobs finished meanwhile, and that's much better than none at all.

Thank you again!
ID: 75880 · Report as offensive

Message boards : Questions and problems : After Hardware upgrade, all GPU task fail with Computation error

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.