The head node (login.wrg.york.ac.uk) is intended for light duties, i.e. to check your file store and to start jobs, light compilation, etc. It is not intended as primary compute nodes.
If you need to run compute intensive work either use a batch job (see the information for running batch jobs on Sun Grid Engine) or use an interactive queue (using qrsh or qlogin). See the information in the Sun Grid Engine section of the help files.
If you wish to use grid-related tools then use globus.wrg.york.ac.uk and globus tools such that it can again schedule jobs through Sun Grid Engine where appropriate.
Sun Grid Engine, Fair Share Policy, Deadlines, Etc.
The SGE system is designed to keep load on each of the compute nodes and the head nodes within appropriate levels.
The system works on a Fair Share Policy. This means that each user can (subject to some caveats outlined below) have an equal share of the compute resources available over a long period of time. In the short term a user can use more than a 1/N share (where there are N users). After a while the system forgets past usage so that if the system is not fully used you can use a significant amount of resource again. If the system is lightly loaded then it allows more usage per user, but if a user that hasn't used system before comes along their jobs will have high priority and may dominate.
There is a limit imposed on the number of running jobs a user may have at any one time.
If you have a need to get jobs finished before a deadline then it is possible to have the system administrators have the ability to allow more jobs to run and/or override the fair share system to boost a user's priority. However after the deadline period your priority would then be reduced. If you know that you have a deadline coming up it is possible to use an Advanced Reservation (more information on this coming soon) to block book time in the future for use.
Job Duration, Checkpointing
Whenever possible keep job durations as short as possible as it allows the system to fair share more easily. We are looking at ways to 'timeslice' jobs so that queue lengths can be short but a job that requires, say, 5 days of processing, as 5 chunks of 1 day. This is in development.
Users are advised to use checkpointing where possible (although not all programs support this). See the Sun Grid Engine notes for more information. This allows the state of a job to be saved part way through its run so if the system crashes the job can be restarted from a progress file and does not need to be started right from the beginning. An ehanced checkpointing system is being investigated.
Please refer to the legal disclaimer covering content on this site.