Your home directory and scratch space are provided by a 1.2PB Lustre file system. The bulk of the Falcon storage is provided by traditional hard drives, though there is also a smaller pool of solid state disk available.
PI's are limited to 25TB stored on the Lustre file system. This is not a 'hard' limit, but enforced by monitoring the usage. If you have a project that requires more than 25TB to be stored - it can probably be accomodated. Send an email to help@c3plus3.org detailing:
The Falcon Operations Committee will look at your request and either approve or deny based on the current amount of free space in the file system and timeline.
When a client (a compute node from your job) needs to create or access a file, the client queries the metadata server (MDS) and the metadata target (MDT) for the layout and location of the file's stripes. Once the file is opened and the client obtains the striping information, the MDS is no longer involved in the file I/O process. The client interacts directly with the object storage servers (OSSes) and object storage targets (OSTs) to perform I/O operations such as locking, disk allocation, storage, and retrieval.
If multiple clients try to read and write the same part of a file at the same time, the Lustre distributed lock manager enforces coherency so that all clients see consistent results.
Jobs being run on Falcon contend for shared resources in the Lustre filesystem. Each server that is part of a Lustre filesystem can only handle a limited number of I/O requests (read, write, stat, open, close, etc.) per second. An excessive number of such requests, from one or more users and one or more jobs, can lead to contention for storage resources. Contention slows the performance of your applications and weakens the overall health of the Lustre filesystem. To reduce contention and improve performance, please apply the examples below to your compute jobs while working in our high-end computing environment.
The ls -l command displays information such as ownership, permission, and size of all files and directories. The information on ownership and permission metadata is stored on the MDTs. However, the file size metadata is only available from the OSTs. So, the ls -l command issues RPCs to the MDS/MDT and OSSes/OSTs for every file/directory to be listed. RPC requests to the OSSes/OSTs are very costly and can take a long time to complete if there are many files and directories.
Opening a file keeps a lock on the parent directory. When many files in the same directory are to be opened, it creates contention. A better practice is to split a large number of files (in the thousands or more) into multiple subdirectories to minimize contention.
Accessing small files on the Lustre filesystem is not efficient, expect less performance or use the lfs setstripe command to move your small files to the SSD pool following this example:
Create a new directory, and set the "striping":
username.ui@ondemand ~ * mkdir faster
username.ui@ondemand ~ * lfs setstripe --pool lfs.ssd faster
username.ui@ondemand ~ * lfs getstripe faster
faster
stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1 pool: ssd
username.ui@ondemand ~ *
Only newly created files in that directory will be on the SSDs - so you need to do a 'cp' operation to put the data there, not a 'mv'
cp ~/some/data ~/faster/
Please only put small datasets or binary files (Python virtual environments, miniconda) in the SSD pool.
On Lustre filesystems, if multiple processes try to open the same file(s), some processes will not able to find the file(s) and your job will fail.
The source code can be modified to call the sleep function between I/O operations. This will reduce the occurrence of multiple, simultaneous access attempts to the same file from different processes.
100 open(unit,file='filename',IOSTAT=ierr)
if (ierr.ne.0) then
...
call sleep(1)
go to 100
endif
When opening a read-only file in Fortran, use ACTION='read' instead of the default ACTION='readwrite'. The former will reduce contention by not locking the file.
open(unit,file='filename',ACTION='READ',IOSTAT=ierr)
Opening files and closing files incur overhead and repetitive open/close should be avoided.
If you intend to open the files for read only, make sure to use ACTION='READ' (or the equivalent for the language you are using) in the open statement. If possible, read the files once each and save the results, instead of reading the files repeatedly.
If you intend to write to a file many times during a run, open the file once at the beginning of the run. When all writes are done, close the file at the end of the run.