Skip to content

Data science tutorial

Shankar Adhikari edited this page Sep 20, 2019 · 23 revisions

Data Science Tutorial; Tools

Version Control

Git and GitHub

Git and Github is the version control system and platform where software developers like you can work together and maintain complete history of work. Here we are going to discuss about the functionality of git and platform developed based on git; GitHub.

What are the different functionality you expect when working with another developer.

  1. Work and share together in same work area.
  2. Does not allow overwriting each other changes.
  3. Maintain history of each changes.

Lets start very basic tutorial on the functionality of GitHub.

  1. Sign up with your email.

  2. Update your profile.

  3. Create first repository. Repository is equivalent to folder in your computer, where we keeps our code, the only difference that it is online.

  4. Now create folder (local repository in your computer).

  5. Now time to install git in your computer. Be careful here, the difference between git and GitHub. Git is software and Github is online systems where you can save your software.

  • installation link

  • Check in your computer has git installed or not git --version

  • Type git in the command line, if your git installed properly it should respond you.

  1. How to push your software in the GitHub from local computer.
  • Initialize your local repository git init
  • Create README.md file echo " # My software " >> README.md
  • Create .gitignore file and add what you do not want to upload in the GitHub. touch .gitignore
        /dir_name 
    
* Next step is add your software first using `git add` command
  ``` git add --all``` or single file ``` git add single_file_name ```
* Let's make history of file by making committing as,
   ``` git commit -m "first commit" ```
* Before adding file from local repository to Github, let's tell local guy, I have my GitHub account with a repository.
  • how to check check remote? git remote

  • how to add remote? git remote add origin link_of_your_repo ``` // origin is alias to link_of_your_repo

  • Now send our code to online repository

git push -u origin master // it can be use as, git push -u <link_of_your_repo> master

  • Check status on each of these steps using

git status

  1. Set up your name and email
    git config --global user.email ' my email'
  1. If I have problem or I would like to change before pushing and after adding, let's remove with ''' git rm --cached 'file name'
 
9. Copy the repository in the local folder;
``` git clone _link_of_file ```

10. Get everybody changes by pulling the repository 

``` git pull ```

11. If you do not access to my repository that we want to commit, then send a pull request

12. Create another branch, rather than master branch

``` git branch _name_of_branch
    git checkout _name_of_branch 

Create branch on existing local repository and add new files on it?

git branch <branch_name>
git checkout <branch_name>
git push -u origin <branch_name> . //set upstream 
git add <file or repo; u want to add>
git commit -m "commit"
git push origin <branch_name>


1. Jupyter Notebook

Jupyter notebook is an interactive web application allows to edit code, write equations, data visualization and many more.

Installation

pip3 install jupyter

How to start from terminal?

Type the following command.

jupyter notebook

After executing this command, the Notebook dashboard will shows up in the web browser with localhost as http://localhost:8888/tree, which is designed to manage all the notebook you have. The notebook is launched using python local server in the web browser which makes this app platform independent.

What can you do in the Jupyter interactive framework?

These parts we will practice in the class.

  • Open new notebook. On right side of dashboard, you can see New tab, hit downarrow and choose python version to open empty one.
  • Edit file name, start typing python command in the cell.
  • Run using play Botton or look for drop down arrow in the cell tab (multiple run tab is available).
  • Investigate other tab such as insert, kernel, view etc.
  • Shell command can be executes using exclamation mark !.
  • Bash style coding is possible with $.
  • Magic commands are available in IPython. To see list of magic command in the terminal type %lsmagic.

.

2. Numpy

Numpy package itself is 800 pound gorilla, huge importance and usability in the case of data science, hard to describe. But I will try my best to go through each of its components as much as possible.

Installation

Go to your conda environment and install by simply typing the following command.

conda install numpy

or in case you do not have conda use pip library
pip install numpy

Lets start this tutorial first by importing package in you pythonic environment,

import numpy as np

Here you are telling python that np is the official reference to numpy from further on.

In the following section I would like to list different functionality and their corresponding command for numpy.

  • Lets create numpy array
list1 = [1,2,6]
array1 = np.array(list1)

// also create within some number having regular steps
np.arange(3,27,3)
  • Cool stuff on numpy is, this library is very efficient working on matrices. So, each time continue checking the shape of array and which can be done as,
array1 = [[2,3,5],[4,9,7]]
np.shape(array1)

This is 2d array.

  • How to change 1d array to 2d or 3d or so on.
array1 = [2,3,5,4,9,7] 
array2 = np.array(array1)
array2.reshape(2,3)
np.shape(array2)

  • Initializing numpy matrix through different ways,
np.zeros((n,m)) // n by m dimensional matrix
np.eye(n) // identity matrix n dimensional diagonal 

  • Matrix multiplication is done using np.dot function when two matrices are in multipliable, ie number of first matrix column should be equal to number of rows in the second matrix.
A = np.eye(3)
B = np.arange(1,10).reshape(3,3)
C = np.dot(A,B)
  • Make a matrix with random values.
A = np.random.rand(2,3)
print (A)
  • Appending values in array
A = np.array([4,9])
np.append(A, 2)
np.append(A, [3,7])
  • Another important functionality is broadcasting, which is
A = np.array([2,3,4])
Add 2 on each
B = A + 2
multiply 
C = A *-1
  • Some mathmatical operations
A = np.arange(1,10).reshape(3,3)
print (A)
A.sum(axis=0)
print (A)

3. Scipy

Scientific and technical computing package for python is Scipy. Scipy library offers large number of computing python modules. Those packages includes the functionalities of interpolation, optimization, Fourier transform, linear algebra, statistics, image processing and so on. An example of implementation of integration package is as follow.

import scipy.integrate

fun = lambda x: 12*x

result = scipy.integrate.quad(fun, 0, 1)
print (result)
  • Optimization and minimization can be done with from scipy.optimize import minimize, scipy library.

Lets follow more explanation in the Notebook.

  • Curve fitting

  • Interpolation

5. Matplotlib

Start importing matplotlib in the python notebook as

import matplotlib.pyplot as plt

In further tutorial, start a basic plotting with different functionality as,

plt.plot([1,2,3,4],[1,4,9,16])
plt.title("number and square")
plt.xlabel("number")
plt.ylabel("square")
plt.show()

Setting any specific size and style of figure can be done with

plt.figure(figsize=(5,6))

Addition to x and y axis value set third argument as style and color, for example blue and solid circle style is done by,

plt.plot(x, y , 'bo')

4. Panda

6. Seaborn