-
Notifications
You must be signed in to change notification settings - Fork 6
Expand file tree
/
Copy pathoutline.txt
More file actions
203 lines (178 loc) · 7.42 KB
/
outline.txt
File metadata and controls
203 lines (178 loc) · 7.42 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# Introductions
## Thank the sponsors
### Klaviyo
I looked at the website, and I still have no clue what you guys do,
but I am pretty sure it can help me make money... somehow...
### Here
A technology company that doesn't end in ".com"!
## Who am I?
B.S. & M.S. in Meteorology from Penn State
- learned Perl, C/C++, MatLab & a terrible language called GrADS
- discovered Linux and Vim and never looked back
- so many software tools for meteorologists are written for Linux
Spent 5 years in PhD hell at the University of Oklahoma
- Ran into all sorts of problems with MatLab
- Colleague suggested that I try Matplotlib & Python
- Within a couple of months, I managed to convert every single
MatLab script of mine into Python
- Kept on developing my research using those tools, and posting
patches to the various mailing lists (pre-GitHub days).
Matplotlib Developer
- Finally, John Hunter of Matplotlib said I annoyed him enough
that he gave me commit rights.
- Personally managed the v1.1.0 release of Matplotlib
- Very active on the mpl mailing list
The matplotlib mailing list is directly responsible for two things:
-- Me not finishing my dissertation
-- Current job
- http://matplotlib.1069221.n5.nabble.com/How-matplotlib-got-me-a-job-td27470.html
- tl;dr; answer questions on mailing lists, you never know if the person
you are helping out will be your next co-worker!
Author
- The Anatomy of Matplotlib Tutorial: https://github.com/WeatherGod/AnatomyOfMatplotlib
- Interactive Applications using Matplotlib: http://www.amazon.com/Interactive-Applications-using-Matplotlib-Benjamin/dp/1783988843/ref=tmm_pap_title_0
SciPy Conference Co-chair
- Financial Aid (2015)
- Tutorials (2016)
Scientific Programmer at Atmospheric and Environmental Research, Inc.
## What is AER?
We are a research-oriented company focusing on observational data
of all kinds -- surface, radar, hydrological and satellite data.
We also work closely with many companies around the world to study how
the weather and climate impacts their business.
We are primarially a Python shop, and our researchers use many other tools
as well. We have a mix of Linux, Mac and Windows environments.
We are looking to hire a few devops people with Linux experience!
You don't need to know a thing about meteorology
-- just have a willingness to learn Python!
# Overview
- TODO: A good image
# NumPy
- *The* numerical N-dimensional array object for Python
- Completely different from the standard python's "array" type
- array is geared towards having a compact form of lists (great for buffers)
- NumPy is for Math!
- Implemented mostly in C, so it is fast
- Just a C-pointer with lots of dressing!
- Can interface very well with many existing numerical tools
(BLAS, ATLAS, FFTW, etc.)
- Has size, shape, ndim, and dtype
- Ex: NumPy math vs. Python math with loops
- Ex: Almost behaves like a hybrid python list/tuple
- Can modify contents (mutable)
- Cannot change its size
# SciPy
- Goes beyond basic math algorithms
- Has general algorithms that apply to many domains
- cluster (e.g., kmeans)
- fftpack (goes way beyond numpy.fft)
- integrate (not just trapezoidal)
- interpolation
- There are interpolators for unstructured arbitrary data (general purpose)
and there are highly efficient interpolators for structured gridded
data. Too many people just use interp1d/interp2d/interpnd when they
don't need to. Read the documentation and find the one that would
work best for your data!
Ex: interp2d versus RectBivariateSpline
- io
- support for some scientific data formats
- this used to be essential before the days of pip/easy_install
- matlab, fortran, netcdf3, arff, IDL, wav, harwell_boeing
- ndimage
- convolutions, filters, transforms
- ndimage.measurements (stats on labeled pixels)
- ndimage.morphology (erosions, dilations, etc.)
- For more advanced features, see scikit-image
- spatial
- (c)KDTree (friggin' awesome!)
Ex: Pineapple store example
- stats
- More statistics distributions than you can shake a stick at.
- And that's just the beginning! Check out statsmodels package
for even more statistics fun.
- TODO: Maybe explain a bit what one can do.
- special
- if you ever get a requirement to use a function with a French,
German, or Russian-sounding name, chances are it is in here
- BTW, when scientists talk of Lambda functions, it is usually
not what you think it is. It is in here.
# Matplotlib
- Cross-platform, interactive-capable, scientific plotting tool
- In science, visualizing data is paramount
- We are visual beings. Our minds inherently understands things if
presented in a visual manner.
- A picture is worth a thousand words
- Make an interactive plot in as few as 3 lines of python::
import matplotlib.pyplot as plt
plt.plot([2, 5, 3])
plt.show()
- It is not stupidly unhelpful, obtrusive.
- Can have control over as much or as little detail as you want
- Also acts as an abstraction library to many GUI toolkits
- Interactive Applications using Matplotlib: http://www.amazon.com/Interactive-Applications-using-Matplotlib-Benjamin/dp/1783988843/ref=tmm_pap_title_0
# IPython / Jupyter
- "Weaponize your tab"
- Copy-n-Paste multi-line code snippets directly into a REPL!
- Notebooks are fantasic for sharing code examples along with results
- Makes it easy to experiment
- Not intended to replace the debugger or be an IDE
- Rather, it is more like a sandbox
# Anaconda/miniconda
- A "python" distribution from Continuum Analytics
(scare quotes because it manages more than just python packages)
- distutils/setuptools isn't good enough for the SciPy community
- this isn't to knock on these tools, but scientists have developed
very useful tools that are not packaged through pip, and may
have very intricit binary dependency requirements.
- So, why don't we just use yum/apt-get/macports/homebrew/(package-manager
-of-the-month)?
- Scientists love to experiment, but they *hate* it when things break.
- They also hate waiting for IT to install the packages they want to try
- Conda provides user-space package management with environment management
- Most dynamic library linkages are at the user-space level, so IT can
update the underlying system with little fear of accidentally breaking
things.
# The rest of the ecosystem
## Visualization
- Bokeh
- seaborn
- cartopy
- basemap (long-term deprecated, new development use cartopy instead)
- ggplot (Grammar of Graphics, ported from R)
- descartes (easy display of geometric shapes)
## Do more with arrays
- pandas
- xarray
- pytables
- numba
- numexpr
- dask
- blaze
- ibis
- odo
## Symbolic Math
- sympy
- sage
## Geographic Processing
- GDAL/OGR (osgeo.ogr/osr)
- fiona
- geopandas
- shapely
## Scientific Data Formats
- NetCDF4
- pytables
- PyNIO
- GDAL (osgeo.gdal)
## Domain-Specialties
- scikit-image
- scikit-learn
- astropy
- statsmodels
# Thoughts
- One of the things I love about the SciPy community is the
co-operative nature of it.
- Projects work to complement each other and "stand on the shoulders
of giants".
- Interesting that this community was predominately SVN and then Git,
while the rest of the python community was on mercurial
- Teaching scientists how to program well: software-carpentry.org