OSCER users, The scheduled maintenance outage has started. We'll alert you when it's finished. Henry ________________________________ From: Neeman, Henry J. <[log in to unmask]> Sent: Monday, June 24, 2024 3:32 PM To: [log in to unmask] <[log in to unmask]> Subject: Re: OSCER scheduled maintenance outage Wed June 26 8am-midnight CT supercomputer OSCER users, IMPORTANT IMPORTANT IMPORTANT IMPORTANT!!! Before the scheduled maintenance outage starts on Wed June 26 8:00am: Jobs that wouldn't finish before the scheduled maintenance outage starts won't be able to start at all, until after the scheduled maintenance outage has ended (planned for Wed night at midnight). Because, by this approach, such jobs can run for the full wall clock time limit that they've requested. So, if you want a job to run before the scheduled maintenance outage begins, then in your batch scripts, you might need to reduce the amount of wall clock time limit you request. (Once the maintenance period ends, this won't apply any more.) If you have jobs that you've already submitted that are pending in the queue, then you might want to reduce their requested wall clock time limit, to give them a chance to start before the scheduled maintenance outage begins. The command is: scontrol update JobId=######## TimeLimit=DD-HH:MM:SS except REPLACE ######## with the job ID number, and REPLACE DD-HH:MM:SS with 2-digit number of days, 2-digit number of hours (beyond the number of days), 2-digit number of minutes (beyond the days and hours) and 2-digit number of seconds (beyond the days, hours and minutes). You have to pick a wall clock time limit short enough that the job can run to its time limit before the start of the scheduled maintenance outage. For example, suppose job 123456 had requested 48 hours of wall clock time limit, but now there are less than 48 hours before the start of the scheduled maintenance outage. So job 123456 won't be able to run until after the scheduled maintenance outage ends. But, we could change the requested wall clock time limit for job 123456 to, for example, 30 hours, like this: scontrol update JobId=123456 TimeLimit=01-06:00:00 In that case, job 123456 would have a *CHANCE* (but *NOT A GUARANTEE*) to run before the outage starts. Henry ________________________________ From: OSCER users <[log in to unmask]> on behalf of Neeman, Henry J. <[log in to unmask]> Sent: Monday, June 24, 2024 3:28 PM To: [log in to unmask] <[log in to unmask]> Subject: Re: OSCER scheduled maintenance outage Wed June 26 8am-midnight CT supercomputer OSCER users, REMINDER: Scheduled maintenance outage Wed June 26 8am-midnight CT ________________________________ From: Neeman, Henry J. <[log in to unmask]> Sent: Thursday, June 20, 2024 5:54 PM To: OSCER users <[log in to unmask]> Subject: OSCER scheduled maintenance outage Wed June 26 8am-midnight CT supercomputer OSCER users, Scheduled maintenance outage Wed June 26 8am-midnight Central Time This will affect OSCER's supercomputer only (NOT OURdisk, NOT OURcloud, NOT OURRstore). Planned supercomputer maintenance tasks: (1) Upgrade the Slurm batch scheduler from Enterprise Linux 7 (EL7) to Enterprise Linux 9 (EL9). (We'll mostly be using the free variant Rocky 9.) (2) Upgrade various internal supercomputer virtual machines from EL7 to EL9. We expect the work to be finished by midnight Central Time. After the scheduled maintenance outage, we'll be converting compute nodes from EL7 to EL9, ramping down the EL7 nodes and ramping up the EL9 nodes. As always, we apologize for the inconvenience -- our goal is to make OSCER resources even better! If you have any questions or concerns, please email us at: [log in to unmask] The OSCER Team