OSCER users, Today's scheduled maintenance outage is complete, with two caveats: (a) The password reset webpage is offline. We hope to have it in production tomorrow (Thu Feb 29), though of course we can't guarantee that. (b) Login node schooner2.oscer.ou.edu is still down. We hope to have it in production tomorrow, but again, not guaranteed. If you have trouble logging in to schooner.oscer.ou.edu, you can try logging in to schooner1.oscer.ou.edu OR schooner3.oscer.ou.edu Logins to dtn1 and dtn2 are working fine. --- We were able to complete almost all of the tasks we listed, with the exception of task 2 (updating Linux on OURdisk) and the password reset webpage, both of which we hope to complete soon. Many thanks to the whole OSCER team for this amazing accomplishment -- y'all are the best! The OSCER Team ([log in to unmask]) ________________________________ From: Neeman, Henry J. <[log in to unmask]> Sent: Monday, February 26, 2024 12:29 PM To: [log in to unmask] <[log in to unmask]> Subject: Re: OSCER scheduled maintenance outage Wed Feb 28 8am-midnight OSCER users, REMINDER: OSCER scheduled maintenance outage Wed Feb 28 8am-midnight. Systems affected: supercomputer, OURdisk, OURcloud, OURRstore tape archive. Details are earlier in this e-mail thread. --- IMPORTANT IMPORTANT IMPORTANT IMPORTANT!!! Before the scheduled maintenance outage starts on Wed Feb 28 8:00am: Jobs that wouldn't finish before the scheduled maintenance outage starts won't be able to start at all, until after the scheduled maintenance outage has ended (planned for Wed night at midnight). Because, by this approach, such jobs can run for the full wall clock time limit that they've requested. So, if you want a job to run before the scheduled maintenance outage begins, then in your batch scripts, you might need to reduce the amount of wall clock time limit you request. (Once the maintenance period ends, this won't apply any more.) If you have jobs that you've already submitted that are pending in the queue, then you might want to reduce their requested wall clock time limit, to give them a chance to start before the scheduled maintenance outage begins. The command is: scontrol update JobId=######## TimeLimit=DD-HH:MM:SS except REPLACE ######## with the job ID number, and REPLACE DD-HH:MM:SS with 2-digit number of days, 2-digit number of hours (beyond the number of days), 2-digit number of minutes (beyond the days and hours) and 2-digit number of seconds (beyond the days, hours and minutes). You have to pick a wall clock time limit short enough that the job can run to its time limit before the start of the scheduled maintenance outage. For example, suppose job 123456 had requested 48 hours of wall clock time limit, but now there are less than 48 hours before the start of the scheduled maintenance outage. So job 123456 won't be able to run until after the scheduled maintenance outage ends. But, we could change the requested wall clock time limit for job 123456 to, for example, 30 hours, like this: scontrol update JobId=123456 TimeLimit=01-06:00:00 In that case, job 123456 would have a *CHANCE* (but *NOT A GUARANTEE*) to run before the outage starts. Henry ________________________________ From: OSCER users <[log in to unmask]> on behalf of Neeman, Henry J. <[log in to unmask]> Sent: Thursday, February 22, 2024 10:57 AM To: [log in to unmask] <[log in to unmask]> Subject: OSCER scheduled maintenance outage Wed Feb 28 8am-midnight OSCER users, OSCER will hold a scheduled maintenance outage Wed Feb 28 8am-midnight. Systems affected: supercomputer, OURdisk, OURcloud, OURRstore tape archive. The OSCER team will perform the following maintenance items (time permitting), in the following priority order: (1) Shift to a new user database instance. (2) Update Linux on OURdisk. (3) Decommission our one older OURdisk monitor node (we have 5 newer monitor nodes, which are already in production alongside the one older one, and the recommendation is to have either 3 or 5). (4) Time permitting, upgrade one of the condominium compute nodes to 25GE (from its current 10GE). (5) Time permitting, reconfigure the routing of the supercomputer's Infiniband-to-Ethernet gateway appliances, and shift onto these IB-to-Ethernet gateway appliances the network traffic of (i) one compute node (for testing) and (ii) one of our scratch subsystems. (6) Time permitting, shift all nodes to the IB-to-Eth gateway appliances. We apologize for the inconvenience -- as always, our goal is to make OSCER resources even better! --- Henry Neeman ([log in to unmask]) Director, OU Supercomputing Center for Education & Research (OSCER) Associate Professor, Gallogly College of Engineering Adjunct Associate Professor, School of Computer Science OU Information Technology The University of Oklahoma Engineering Lab 212, 200 Felgar St, Norman OK 73019 405-325-5386 (office), 405-325-5486 (fax), 405-245-3823 (cell), [log in to unmask] (to email me a text message) http://www.oscer.ou.edu/