In cases where "CL CPU-1M Particles" passes,
clEnqueueNDRangeKernel is called with, global_work_size = < 1000000 >, local_work_size = < 16 >
The failing case: the call is then changed to
clEnqueueNDRangeKernel : global_work_size = < 1000192 >, local_work_size = < 256 >, Otherwise the OCL calls are identical
Debugging of calls to RT functions clCreateBuffer, clSetKernelArg, and clEnqueueNDRange confirmed that this is Gpu_Caps_Viewer's bug. The application creates buffers of 1000000 elements in size while kernel is running on NDRange of size 1000192 and using unmodified global id to access elements those buffers.