Skip to content

Commit 39a1d19

Browse files
Julian Kates-HarbeckJulian Kates-Harbeck
authored andcommitted
2 parents 6bddbf3 + e839dad commit 39a1d19

24 files changed

+485
-154
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# FRNN [![Build Status](https://travis-ci.org/PPPLDeepLearning/plasma-python.svg?branch=master)](https://travis-ci.org/PPPLDeepLearning/plasma-python.svg?branch=master)
1+
# FRNN [![Build Status](https://travis-ci.org/PPPLDeepLearning/plasma-python.svg?branch=master)](https://travis-ci.org/PPPLDeepLearning/plasma-python.svg?branch=master) [![Build Status](https://jenkins.princeton.edu/buildStatus/icon?job=FRNM/PPPL)](https://jenkins.princeton.edu/job/FRNM/job/PPPL/)
22

33
## Package description
44

data/signals.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -213,14 +213,14 @@ def fetch_nstx_data(signal_path,shot_num,c):
213213
'pradedge':pradedge,'pradtot':pradtot,'pin':pin,
214214
'torquein':torquein,
215215
'energydt':energydt,'ipdirect':ipdirect,'iptarget':iptarget,'iperr':iperr,
216-
'tmamp1':tmamp1,'tmamp2':tmamp2,'tmfreq1':tmfreq1,'tmfreq2':tmfreq2,'pechin':pechin,
216+
#'tmamp1':tmamp1,'tmamp2':tmamp2,'tmfreq1':tmfreq1,'tmfreq2':tmfreq2,'pechin':pechin,
217217
# 'rho_profile_spatial':rho_profile_spatial,'etemp':etemp,
218-
'etemp_profile':etemp_profile,'edens_profile':edens_profile,
219-
'itemp_profile':itemp_profile,'zdens_profile':zdens_profile,
220-
'trot_profile':trot_profile,'pthm_profile':pthm_profile,
221-
'neut_profile':neut_profile,'q_profile':q_profile,
222-
'bootstrap_current_profile':bootstrap_current_profile,
223-
'q_psi_profile':q_psi_profile}
218+
'etemp_profile':etemp_profile,'edens_profile':edens_profile}
219+
#'itemp_profile':itemp_profile,'zdens_profile':zdens_profile,
220+
#'trot_profile':trot_profile,'pthm_profile':pthm_profile,
221+
#'neut_profile':neut_profile,'q_profile':q_profile,
222+
#'bootstrap_current_profile':bootstrap_current_profile,
223+
#'q_psi_profile':q_psi_profile}
224224
#}
225225

226226
#new signals are not downloaded yet
@@ -236,6 +236,7 @@ def fetch_nstx_data(signal_path,shot_num,c):
236236
print(all_signals.values())
237237

238238
fully_defined_signals = {sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if sig.is_defined_on_machines(all_machines)}
239+
fully_defined_signals_0D = {sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if ( sig.is_defined_on_machines(all_machines) and sig.num_channels == 1) }
239240
d3d_signals = {sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if sig.is_defined_on_machine(d3d)}
240241
jet_signals = {sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if sig.is_defined_on_machine(jet)}
241242
jet_signals_0D = {sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if (sig.is_defined_on_machine(jet) and sig.num_channels == 1)}

docs/PrincetonUTutorial.md

Lines changed: 58 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,57 @@
11
## Tutorials
22

3+
### Login to Tigergpu
4+
5+
First, login to TigerGPU cluster headnode via ssh:
6+
```
7+
ssh -XC <yourusername>@tigergpu.princeton.edu
8+
```
9+
310
### Sample usage on Tigergpu
411

5-
First, create an isolated Anaconda environment and load CUDA drivers:
12+
Next, check out the source code from github:
613
```
7-
module load anaconda3
8-
module load cudatoolkit/8.0 cudnn/cuda-8.0/6.0 openmpi/cuda-8.0/intel-17.0/2.1.0/64 intel/17.0/64/17.0.2.174
9-
module load intel/17.0/64/17.0.4.196 intel-mkl/2017.3/4/64
14+
git clone https://github.com/PPPLDeepLearning/plasma-python
15+
cd plasma-python
16+
```
17+
18+
After that, create an isolated Anaconda environment and load CUDA drivers:
19+
```
20+
#cd plasma-python
21+
module load anaconda3/4.4.0
1022
conda create --name my_env --file requirements-travis.txt
1123
source activate my_env
24+
25+
export OMPI_MCA_btl="tcp,self,sm"
26+
module load cudatoolkit/8.0
27+
module load cudnn/cuda-8.0/6.0
28+
module load openmpi/cuda-8.0/intel-17.0/2.1.0/64
29+
module load intel/17.0/64/17.0.4.196
1230
```
1331

14-
Then install the plasma-python package:
32+
and install the `plasma-python` package:
1533

1634
```bash
1735
#source activate my_env
18-
git clone https://github.com/PPPLDeepLearning/plasma-python
19-
cd plasma-python
2036
python setup.py install
2137
```
2238

2339
Where `my_env` should contain the Python packages as per `requirements-travis.txt` file.
2440

41+
#### Common issue
42+
43+
Common issue is Intel compiler mismatch in the `PATH` and what you use in the module. With the modules loaded as above,
44+
you should see something like this:
45+
```
46+
$ which mpicc
47+
/usr/local/openmpi/cuda-8.0/2.1.0/intel170/x86_64/bin/mpicc
48+
```
49+
50+
If you source activate the Anaconda environment after loading the openmpi, you would pick the MPI from Anaconda, which is not good and could lead to errors.
51+
2552
#### Location of the data on Tigress
2653

27-
The JET and D3D datasets containing multi-modal time series of sensory measurements leading up to deleterious events called plasma disruptions are located on /tigress filesystem on Princeton U clusters.
54+
The JET and D3D datasets containing multi-modal time series of sensory measurements leading up to deleterious events called plasma disruptions are located on `/tigress/FRNN` filesystem on Princeton U clusters.
2855
Fo convenience, create following symbolic links:
2956

3057
```bash
@@ -39,16 +66,22 @@ ln -s /tigress/FRNN/signal_data signal_data
3966
cd examples/
4067
python guarantee_preprocessed.py
4168
```
42-
This will preprocess the data and save it in `/tigress/<netid>/processed_shots` and `/tigress/<netid>/normalization`
69+
This will preprocess the data and save it in `/tigress/<netid>/processed_shots`, `/tigress/<netid>/processed_shotlists` and `/tigress/<netid>/normalization`
4370

71+
You would only have to run preprocessing once for each dataset. The dataset is specified in the config file `examples/conf.yaml`:
72+
```yaml
73+
paths:
74+
data: jet_data_0D
75+
```
76+
It take takes about 20 minutes to preprocess in parallel and can normally be done on the cluster headnode.
4477
4578
#### Training and inference
4679
47-
Use Slurm scheduler to perform batch or interactive analysis on Tiger cluster.
80+
Use Slurm scheduler to perform batch or interactive analysis on TigerGPU cluster.
4881
4982
##### Batch analysis
5083
51-
For batch analysis, make sure to allocate 1 process per GPU:
84+
For batch analysis, make sure to allocate 1 MPI process per GPU. Save the following to slurm.cmd file (or make changes to the existing `examples/slurm.cmd`):
5285

5386
```bash
5487
#!/bin/bash
@@ -58,15 +91,20 @@ For batch analysis, make sure to allocate 1 process per GPU:
5891
#SBATCH --ntasks-per-socket=2
5992
#SBATCH --gres=gpu:4
6093
#SBATCH -c 4
94+
#SBATCH --mem-per-cpu=0
6195
62-
module load anaconda3
96+
module load anaconda3/4.4.0
6397
source activate my_env
64-
module load cudatoolkit/8.0 cudnn/cuda-8.0/6.0 openmpi/cuda-8.0/intel-17.0/2.1.0/64 intel/17.0/64/17.0.2.174
65-
module load intel/17.0/64/17.0.4.196 intel-mkl/2017.3/4/64
98+
export OMPI_MCA_btl="tcp,self,sm"
99+
module load cudatoolkit/8.0
100+
module load cudnn/cuda-8.0/6.0
101+
module load openmpi/cuda-8.0/intel-17.0/2.1.0/64
102+
module load intel/17.0/64/17.0.4.196
103+
66104
srun python mpi_learn.py
67105
68106
```
69-
where X is the number of nodes for distibuted training.
107+
where `X` is the number of nodes for distibuted training.
70108

71109
Submit the job with:
72110
```bash
@@ -82,11 +120,11 @@ Optionally, add an email notification option in the Slurm about the job completi
82120

83121
##### Interactive analysis
84122

85-
Interactive option is preferred for debugging or running in the notebook, for all other case batch is preferred.
123+
Interactive option is preferred for **debugging** or running in the **notebook**, for all other case batch is preferred.
86124
The workflow is to request an interactive session:
87125

88126
```bash
89-
salloc -N [X] --ntasks-per-node=4 --ntasks-per-socket=2 --gres=gpu:4 -t 0-6:00
127+
salloc -N [X] --ntasks-per-node=4 --ntasks-per-socket=2 --gres=gpu:4 -c 4 --mem-per-cpu=0 -t 0-6:00
90128
```
91129
where the number of GPUs is X * 4.
92130

@@ -104,7 +142,7 @@ Currently, FRNN is capable of working with JET and D3D data as well as cross-mac
104142
```yaml
105143
paths:
106144
...
107-
data: 'jet_data'
145+
data: 'jet_data_0D'
108146
```
109147
use `d3d_data` for D3D signals, use `jet_to_d3d_data` ir `d3d_to_jet_data` for cross-machine regime.
110148

@@ -116,6 +154,8 @@ paths:
116154
```
117155
if left empty `[]` will use all valid signals defined on a machine. Only use if need a custom set.
118156

157+
Other parameters configured in the conf.yaml include batch size, learning rate, neural network topology and special conditions foir hyperparameter sweeps.
158+
119159
### Current signals and notations
120160

121161
Signal name | Description

examples/compare_batch_iterators.py

Lines changed: 0 additions & 75 deletions
This file was deleted.

examples/compare_performance.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,20 @@
4343
shots = analyzers[0].shot_list_test
4444

4545
for shot in shots:
46-
types = [analyzers[i].get_prediction_type_for_individual_shot(P_threshs[i],shot,mode='test') for i in range(len(analyzers))]
47-
#if len(set(types)) > 1:
48-
if types == ['TP','FN']:
49-
print(shot.number)
50-
print(types)
51-
for i,analyzer in enumerate(analyzers):
52-
analyzer.save_shot(shot,P_thresh_opt=P_threshs[i],extra_filename=['deep','shallow'][i])
46+
if all([(shot in analyzer.shot_list_test or shot in analyzer.shot_list_train) for analyzer in analyzers]):
47+
types = [analyzers[i].get_prediction_type_for_individual_shot(P_threshs[i],shot,mode='test') for i in range(len(analyzers))]
48+
#if len(set(types)) > 1:
49+
if types == ['TP','late']:
50+
if shot in analyzers[1].shot_list_test:
51+
print("TEST")
52+
else:
53+
print("TRAIN")
54+
print(shot.number)
55+
print(types)
56+
for i,analyzer in enumerate(analyzers):
57+
analyzer.save_shot(shot,P_thresh_opt=P_threshs[i],extra_filename=['1D','0D'][i])
58+
else:
59+
pass
60+
#print("shot {} not in train or test shot list (must be in validation)".format(shot))
5361

5462

examples/conf.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ paths:
1010
signal_prepath: '/signal_data/' #/signal_data/jet/
1111
shot_list_dir: '/shot_lists/'
1212
tensorboard_save_path: '/Graph/'
13-
data: 'jet_data' #'d3d_to_jet_data' #'d3d_to_jet_data' # 'jet_to_d3d_data' #jet_data
13+
data: jet_data #'d3d_to_jet_data' #'d3d_to_jet_data' # 'jet_to_d3d_data' #jet_data
1414
specific_signals: [] #['q95','li','ip','betan','energy','lm','pradcore','pradedge','pradtot','pin','torquein','tmamp1','tmamp2','tmfreq1','tmfreq2','pechin','energydt','ipdirect','etemp_profile','edens_profile'] #if left empty will use all valid signals defined on a machine. Only use if need a custom set
1515
executable: "mpi_learn.py"
1616
shallow_executable: "learn.py"

examples/guarantee_preprocessed.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@
1111
pprint(conf)
1212
from plasma.preprocessor.preprocess import guarantee_preprocessed
1313

14-
os.environ["PYTHONHASHSEED"] = "0"
15-
1614
#####################################################
1715
####################PREPROCESSING####################
1816
#####################################################

examples/jenkins.sh

Lines changed: 0 additions & 16 deletions
This file was deleted.

examples/mpi_learn.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@
2424
import random
2525
import numpy as np
2626

27-
os.environ["PYTHONHASHSEED"] = "0"
28-
2927
import matplotlib
3028
matplotlib.use('Agg')
3129

File renamed without changes.

0 commit comments

Comments
 (0)