Home About Software Documentation Support Outreach Ecosystem Blog Dev Awards Team & Sponsors

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

about:functionality:faulttol [ - 2015/03/09, 12:58 - ] (current)
Line 1: Line 1:
 +====== B. Fault Detection & Recovery Capabilities ======
 +
 +===== B.1. Job Cancellation =====
 +
 +
 +**Description:​**
 +
 +    * A Job could be cancelled for several reasons, for example by the local resource management system when it exceeds the wall time limit or by the system administrator to preserve system performance.
 +
 +**Support in Last Release:**
 +
 +    * GW detects job cancellation when the job exit code is not specified and requests migration.
 +
 +===== B.2 Remote System Crash or Outage =====
 +
 +
 +**Description:​**
 +
 +    * Grid resources could unpredictably fail. These failures comprise hardware, operating system and Grid middleware components.
 +
 +Support in Last Release:
 +
 +    * GW detects system crash when the polling of the job fails and requests migration.
 +
 +
 +
 +===== B.3 Network Disconnection =====
 +
 +
 +**Description:​**
 +
 +    * Grid connections could unpredictably fail. Moreover, system administrators are freely to disconnect it resources, for example, due to local site maintenance.
 +
 +**Support in Last Release:**
 +
 +    * GW detects network disconnection when the polling of the job fails and requests migration.
 +
 +===== B.4. Client Fault Tolerance =====
 +
 +
 +**Description:​**
 +
 +    * The system running the scheduler could fail.
 +
 +**Support in Last Release:**
 +
 +    * GW periodically saves its state in order to recover from local failure.
  
Admin · Log In