OSCER-USERS-L Archives

OSCER users

OSCER-USERS-L@LISTS.OU.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Henry Neeman <[log in to unmask]>
Reply To:
Henry Neeman <[log in to unmask]>
Date:
Tue, 25 Apr 2023 16:09:04 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (79 lines)
OSCER users,

UPDATE:

The OSCER Operations Team has made good progress on
the OURdisk problem, but aren't done yet, and there's
a high probability that the issue won't get resolved today.

Working closely all day with our external Ceph expert
consultant, who has now gone to sleep (because it's
a lot later in Europe than here), we've restored our
Ceph monitor servers to service.

But, we're still troubleshooting what's likely to be
a server-level network issue, which is preventing our
monitor servers from maintaining a "quorum" (mutual agreement
about the state of the system).

We're continuing to move forward on this and have
multiple approaches to try.

As always, we're very aware of how disruptive this issue is
and are doing our best to get it resolved as quickly as
we can. We apologize for the trouble.

Henry

----------

On Mon, 24 Apr 2023, Henry Neeman wrote:

>OSCER users,
>
>UPDATE:
>
>We hope to have OURdisk in production later today. Our team
>has been working on it since yesterday morning.
>
>We think we've isolated the issue to the network driver layer,
>and we're collaborating closely with our external Ceph expert
>consultant to resolve the issue.
>
>We also hope to have a more permanent fix in place soon,
>most likely during the next maintenance outage, which is
>likely to be mid-May, in order to avoid dissertation and
>thesis deadlines.
>
>But, we can't know for sure until we've returned OURdisk to
>service.
>
>Again, we apologize for the trouble.
>
>Henry
>
>----------
>
>On Sun, 23 Apr 2023, Henry Neeman wrote:
>
>OSCER users,
>
>OURdisk is down. Our team is working to return it to service,
>but it's most likely that'll take until tomorrow (Mon Apr 24).
>
>We apologize for the trouble.
>
>---
>
>Henry Neeman ([log in to unmask])
>Director, OU Supercomputing Center for Education & Research (OSCER)
>Associate Professor, Gallogly College of Engineering
>Adjunct Associate Professor, School of Computer Science
>OU Information Technology
>The University of Oklahoma
>
>Engineering Lab 212, 200 Felgar St, Norman OK 73019
>405-325-5386 (office), 405-325-5486 (fax), 405-245-3823 (cell),
>[log in to unmask] (to e-mail me a text message)
>http://www.oscer.ou.edu/

ATOM RSS1 RSS2