There are two examples shown here, one using R and the other Python. The R example does not produce any output of particular usefullness, but will help you get acquainted with the conventions of cluster computing. The Python example is closer to useful code, but again is mostly there to provide an example of how to use Python on Falcon.
First step - Log into Falcon - either through ondemand or SSH jump box. If you do not already have an account, you can request one here.
The Getting Started Guide has more information about loading modules and the partitions on Falcon.
In this example we'll simulate 1000's of monkeys typing randomly and see how many words are produced, and create a literary masterpiece along the way.
Log into ondemand interface, and and choose Clusters
-> _staging Shell Access
(best) or Clusters
-> _Falcon Shell Access
Load the R module and install R packages stringi and foreach.
boswald.ui@ondemand ~ > module load r
boswald.ui@ondemand ~ > R
R version 4.2.2 (2022-10-31) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> install.packages("stringi",repos='https://ftp.osuosl.org/pub/cran/')
... some time later...
> install.packages("foreach",repos='https://ftp.osuosl.org/pub/cran/')
...
> quit(save="no")
You only need to install the R packages once - and it must be from staging or ondemand.c3plus3.org - the cluster nodes do not have access to the internet.
Next, upload a list of words. Download from here or here
Use the Files
->Home Directory
interface in your browser to create a new folder called 'workshop' and upload the data file.
The R script
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
if (length(args)==0) {
stop("You must provide an output file name", call.=FALSE)
}
library(stringi)
library(foreach)
nwords = 1000
maxlength = 10
wlist = foreach(i=1:maxlength) %do% stri_rand_strings(nwords, i, '[a-z]')
#load word dictionary from file
real_words = read.table("~/workshop/engmix.txt", header = FALSE, sep = "", dec = ".")
#test if the words in wlist are real words
found_words = c()
for(wl in 2:maxlength){
for(wi in 1:nwords){
if (any(real_words$V1 == wlist[[wl]][wi]))
{
found_words = append(found_words, wlist[[wl]][wi])
}
}
}
#open a file to save found words
fileConn<-file(args[1])
writeLines(found_words, fileConn)
close(fileConn)
The SLURM script (monkey.slurm)
#!/bin/bash
#SBATCH -p short
cd $SLURM_SUBMIT_DIR
module load r
Rscript --vanilla monkey.R m$SLURM_JOB_ID.txt
Submit the script
sbatch monkey.slurm
Check to see if your job is running
squeue --me
Easy enough to start a few monkeys typing - just repeat the sbatch command, but what if we want a thousand monkeys? Time for an array job. We just need to modify the SLURM submit script (monkeys.slurm)
#!/bin/bash
#SBATCH -p tiny
cd $SLURM_SUBMIT_DIR
module load r
Rscript --vanilla monkey.R m$SLURM_ARRAY_JOB_ID.$SLURM_ARRAY_TASK_ID.txt
Now submit a thousand monkeys at once:
sbatch -a 1-1000 monkeys.slurm
Retrieve our Shakespeare-esq work (replace the job id below with yours) with a little BASH magic, this will grab a random word from each output file.
for fn in {1..1000}; do printf "%s " $(shuf -n1 m48290.$fn.txt); done
Save this masterpiece to a file (again replace the job id below with yours):
for fn in {1..1000}; do printf "%s " $(shuf -n1 m48290.$fn.txt) >> shakespeare.txt; done
Use Python, start by creating a virtual environment and activating it
boswald.ui@ondemand ~ * virtualenv one-o-one
Using base prefix '/usr'
New python executable in /lfs/boswald.ui/one-o-one/bin/python3.6
Also creating executable in /lfs/boswald.ui/one-o-one/bin/python
Installing setuptools, pip, wheel...done.
boswald.ui@ondemand ~ * source one-o-one/bin/activate
Now install needed packages
(one-o-one) boswald.ui@ondemand ~ * pip3 install tensorflow tensorflow_datasets numpy matplotlib
Collecting tensorflow
Downloading tensorflow-2.6.2-cp36-cp36m-manylinux2010_x86_64.whl (458.3 MB)
...
much output later
...
Building wheel for wrapt (setup.py) ... done
Created wheel for wrapt: filename=wrapt-1.12.1-cp36-cp36m-linux_x86_64.whl size=73055 sha256=f1c3c0250657f61b5693308939dd5fbabd127c691c9b403df6c1e3a115aa361b
Stored in directory: /lfs/boswald.ui/.cache/pip/wheels/32/42/7f/23cae9ff6ef66798d00dc5d659088e57dbba01566f6c60db63
Successfully built clang termcolor wrapt
Installing collected packages: urllib3, pyasn1, idna, charset-normalizer, certifi, zipp, typing-extensions, six, rsa, requests, pyasn1-modules, oauthlib, cachetools, requests-oauthlib, importlib-metadata, google-auth, dataclasses, werkzeug, tensorboard-plugin-wit, tensorboard-data-server, protobuf, numpy, markdown, grpcio, google-auth-oauthlib, cached-property, absl-py, wrapt, termcolor, tensorflow-estimator, tensorboard, python-dateutil, pyparsing, pillow, opt-einsum, kiwisolver, keras-preprocessing, keras, h5py, google-pasta, gast, flatbuffers, cycler, clang, astunparse, tensorflow, matplotlib
Successfully installed absl-py-0.15.0 astunparse-1.6.3 cached-property-1.5.2 cachetools-4.2.4 certifi-2022.12.7 charset-normalizer-2.0.12 clang-5.0 cycler-0.11.0 dataclasses-0.8 flatbuffers-1.12 gast-0.4.0 google-auth-1.35.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.48.2 h5py-3.1.0 idna-3.4 importlib-metadata-4.8.3 keras-2.6.0 keras-preprocessing-1.1.2 kiwisolver-1.3.1 markdown-3.3.7 matplotlib-3.3.4 numpy-1.19.5 oauthlib-3.2.2 opt-einsum-3.3.0 pillow-8.4.0 protobuf-3.19.6 pyasn1-0.4.8 pyasn1-modules-0.2.8 pyparsing-3.0.9 python-dateutil-2.8.2 requests-2.27.1 requests-oauthlib-1.3.1 rsa-4.9 six-1.15.0 tensorboard-2.6.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.6.2 tensorflow-estimator-2.6.0 termcolor-1.1.0 typing-extensions-3.7.4.3 urllib3-1.26.14 werkzeug-2.0.3 wrapt-1.12.1 zipp-3.6.0
(one-o-one)
All installed ! Now make a directory to organize
(one-o-one) boswald.ui@ondemand ~ * mkdir one
(one-o-one) boswald.ui@ondemand ~ * cd one
python script to train model - saved as 'sentiment.train.py' Following the Tensorflow example here. The script file can be created through the console by copy/pasting or through the ondemand interface. Update the path to the Tensorflow datasets
#!/bin/python
import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf
import argparse
import os
import matplotlib.pyplot as plt
parser = argparse.ArgumentParser(description="Trains and saves a tensorflow-keras model for sentiment analysis")
parser.add_argument("-j","--jobid",help="the slurm jobid or other unique number",required=False,default="00000")
args = parser.parse_args()
tfds.disable_progress_bar()
dataset, info = tfds.load('imdb_reviews', data_dir='/lfs/boswald.ui/tensorflow_datasets', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
BUFFER_SIZE = 10000
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
VOCAB_SIZE = 1000
encoder = tf.keras.layers.experimental.preprocessing.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(train_dataset.map(lambda text, label: text))
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
history = model.fit(train_dataset, epochs=3,
validation_data=test_dataset,
validation_steps=30)
test_loss, test_acc = model.evaluate(test_dataset)
print('Test Loss:', test_loss)
print('Test Accuracy:', test_acc)
model.save_weights("sentiment.ckpt"+args.jobid)
#export a plot of the training
def plot_graphs(history, metric):
plt.plot(history.history[metric])
plt.plot(history.history['val_'+metric], '')
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend([metric, 'val_'+metric])
plt.figure(figsize=(16, 8))
plt.subplot(1, 2, 1)
plot_graphs(history, 'accuracy')
plt.ylim(None, 1)
plt.subplot(1, 2, 2)
plot_graphs(history, 'loss')
plt.ylim(0, None)
plt.savefig(os.getcwd()+"/training_results.png", format='png', dpi=150)
#this saves, but is buggy and can't load the model again:
model.save('sentiment.'+args.jobid)
exit()
Cluster nodes do not have internet access - you need to download any data prior to submitting the job
(one-o-one) boswald.ui@ondemand ~/one * python3
Python 3.6.8 (default, Apr 12 2022, 06:55:39)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-10)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow_datasets as tfds
2023-02-07 10:47:21.440650: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-07 10:47:21.440686: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>>
>>> BATCH_SIZE = 64
>>> train_ds = tfds.load('imdb_reviews', split='train[:80%]', batch_size=BATCH_SIZE, shuffle_files=True, as_supervised=True)
>>> exit()
(one-o-one) boswald.ui@ondemand ~/one *
First, create the job submission script
#!/bin/bash
#SBATCH -p short
cd $SLURM_SUBMIT_DIR
hostname
source ~/one-o-one/bin/activate
START=$(date +%s)
python3 sentiment.train.py -j $SLURM_JOBID
let RUNTIME=$(date +%s)-$START
echo "Training time: $RUNTIME"
echo "*--done--*"
Now submit the job
(one-o-one) boswald.ui@ondemand ~/one * sbatch sentiment.slurm
Submitted batch job 10092
(one-o-one) boswald.ui@ondemand ~/one * squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10092 short sentimen boswald. R 0:07 1 r3i5n0
(one-o-one) boswald.ui@ondemand ~/one *
Now lets use the model we trained. First, create the inference file:
#!/bin/python3
import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf
import argparse
import code
parser = argparse.ArgumentParser(description="Trains and saves a tensorflow-keras model for sentiment analysis")
parser.add_argument("-j","--jobid",help="the slurm jobid or other unique number",required=False,default="00000")
args = parser.parse_args()
tfds.disable_progress_bar()
dataset, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
BUFFER_SIZE = 10000
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE).prefetch(tf.data.experimental.AUTOTUNE)
test_dataset = test_dataset.batch(BATCH_SIZE).prefetch(tf.data.experimental.AUTOTUNE)
VOCAB_SIZE = 1000
encoder = tf.keras.layers.experimental.preprocessing.TextVectorization(
max_tokens=VOCAB_SIZE)
encoder.adapt(train_dataset.map(lambda text, label: text))
model = tf.keras.Sequential([
encoder,
tf.keras.layers.Embedding(
input_dim=len(encoder.get_vocabulary()),
output_dim=64,
# Use masking to handle the variable sequence lengths
mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
model.load_weights("sentiment.ckpt"+args.jobid)
print("#--------------------------------------------------------------------#")
print("\nUse the infer function to analyze some text. For example:\ninfer('this is the text to analyze sentiment in',model) \n negative numbers indicate negative sentiment, positive numbers positive sentiment\n")
def infer(thetext, mdl):
predicts = mdl.predict(np.array([thetext]))
print("sentiment: "+str(predicts[0]))
code.interact(local=locals())
exit()
Run the file, using the job number that you used to train the model
(one-o-one) boswald.ui@ondemand ~/one * ls
checkpoint sentiment.10096 sentiment.ckpt10096.data-00000-of-00001 sentiment.ckpt10096.index sentiment.slurm sentiment.train.py slurm-10092.out slurm-10095.out slurm-10096.out training_results.png
(one-o-one) boswald.ui@ondemand ~/one * nano inference.py
(one-o-one) boswald.ui@ondemand ~/one * python3 inference.py -j 10096
2023-02-07 11:09:47.506250: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-02-07 11:09:47.506289: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-07 11:09:53.298806: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-02-07 11:09:53.299341: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-02-07 11:09:53.299401: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ondemand): /proc/driver/nvidia/version does not exist
2023-02-07 11:09:53.302200: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-07 11:09:54.181238: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
#--------------------------------------------------------------------#
Use the infer function to analyze some text. For example:
infer('this is the text to analyze sentiment in',model)
negative numbers indicate negative sentiment, positive numbers positive sentiment
Python 3.6.8 (default, Apr 12 2022, 06:55:39)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-10)] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> infer('some text to analyze here',model)
sentiment: [-0.15224136]
>>> infer('happy day, a good movie, fun for all',model)
sentiment: [0.9714721]
>>> infer('i hate apples, they taste like sand',model)
sentiment: [-0.13639668]
>>> infer('puppies are cute - especially when playing with a ball',model)
sentiment: [0.36142796]
>>> exit()
(one-o-one) boswald.ui@ondemand ~/one *