OSCER-USERS-L Archives

OSCER users

OSCER-USERS-L@LISTS.OU.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Henry Neeman <[log in to unmask]>
Reply To:
Henry Neeman <[log in to unmask]>
Date:
Tue, 29 Nov 2022 14:43:27 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (57 lines)
OSCER and OURdisk users,

Do you have anaconda/miniconda packages on OURdisk and/or
on /scratch?

If so, please reply to me only (*NOT* to the list), so
we can work together to make this better.

---

Explanation:

Last week, OSCER had a call with a Ceph expert at the
San Diego Supercomputer Center.

(Both OURdisk and some of Schooner's /scratch directories
use Ceph.)

One of the key bits of guidance he gave us was this:

*DON'T* put anaconda/miniconda packages on Ceph, period.

Apparently, invoking packages from anaconda/miniconda
during a run opens a huge number of files at the same time
(presumably the files in the relevant anaconda packages).

We've learned that every file on Ceph that's
open at the same time consumes a little bit of RAM
on the Ceph metadata servers.

That's fine if a modest number of files are open at the
same time.

But apparently anaconda/miniconda can open zillions of files
at the same time.

Which can wreak havoc with the Ceph metadata servers, causing
massive problems with that Ceph system.

Disclaimer: I'm not a Ceph expert -- I've gleaned this
information from various discussions we've had over the
past few months.

---

Henry Neeman ([log in to unmask])
Director, OU Supercomputing Center for Education & Research (OSCER)
Associate Professor, Gallogly College of Engineering
Adjunct Associate Professor, School of Computer Science
OU Information Technology
The University of Oklahoma

Engineering Lab 212, 200 Felgar St, Norman OK 73019
405-325-5386 (office), 405-325-5486 (fax), 405-245-3823 (cell),
[log in to unmask] (to e-mail me a text message)
http://www.oscer.ou.edu/

ATOM RSS1 RSS2