MsaPythonDataAnalytics/MsaPythonDataAnalytics.qmd at main · Qprop/MsaPythonDataAnalytics

History

1662 lines (1029 loc) · 35.1 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

---

title: "Data Analytics Introduction Using Python Training"

jupyter: python3

format:

revealjs:

theme: serif

smaller: false

scrollable: true

incremental: false

transition: concave

background-transition: fade

controls: true

code-fold: true

code-tools: true

---

# Introduction

## Welcome

We are embarking on a journey, with multiple stops but the destination will be far out. The stops will be areas of expertise and learning that we will have done and practiced but the destination is the progressive accumulation of the learning.

With the ever evolving changing of technology the destination will always be changing.

------------------------------------------------------------------------

::: {.columns}

::: {.column width="60%"}

### Assumptions {.smaller}

Have some limited or little knowledge of Microsoft excel functions e.g `=sum(C1,C2)`, though not mandatory it makes it easier to grasp concepts slightly faster. Though if you remember the simple mathematics we do everyday of summing and adding money then you are good :).

:::

::: {.column width="40%"}

### Requirements {.smaller}

- *Interest* and *Desire* to learn

- Working Computer (Laptop/Desktop)

- Access to internet once in a while to download a few resource materials

:::

------------------------------------------------------------------------

::: {.columns}

::: {.column width="40%"}

### Teaching guide

> Mixed instructional guide, with the presentation but focused on getting more hard skills

> Engagements and practice on the fly

:::

::: {.column width="60%"}

### About

Just me but you can see you can get the details from [qprop](https://www.qprop.me/about/)

Will be getting assisted by Alex.

:::

## Data Analytics {.smaller}

Colloquial term ***data analytics*** can be coined as both a science and an art. With the ***science*** part majorly following statistical/mathematical procedures used, *art* comes from the different ways and methods someone can use to present and execute the since part of it.

### Tools for Data Analytics

There are various tools/Software/applications can be used for data analysis

::: {.columns}

::: {.column width="50%"}

- Paid for:

- Ms Excel

- Ms Power BI

- SPSS

- STATA

- SAS

- MS SQL

:::

::: {.column width="50%"}

- Open Source: **Free**

- ***Python***

- R

- Postgres

- Julia

- MangoDB

- CouchDB

:::

# Python {.smaller}

## Python Introduction

Open Source Programming language.First come to use in the early 1990's and developed by **Guido Van Rossum** more information [here](https://en.wikipedia.org/wiki/Python_(programming_language)).

::: {.columns}

::: {.column width="50%"}

### Advantages

- Easy to learn

- Almost language like syntax

- Fast execution

- All Purpose programming language:

- used for software development

- used for data analysis

- used for machine learning

- used for web development

:::

::: {.column width="50%"}

### Disadvantages

- Some convections are different from other programming language

- High level interpreted language

:::

## Python Interface {.smaller}

Python programming language uses it's in build command prompt frequently called *CLI* standing for *Command Line Interface*, search through windows/MAC program files and you should see *Python 3.0.0)* there are various versions of python and depending on which one you have installed this will determine the python CLI.

Though you can be able to do everything using this, it doesn't give an intuitive user interface hence the reason for development of *IDE* Integrated Development Environment.

*IDE* is the dashboard similar to car dashboard but the actual software the engine is now *python* for this case.

::: {.panel-tabset}

### IDE's Common to Python

- Pycharm

- Spyder

- *jupyter*

- vsCode

- Rstudio

- Positron (*New in Beta*)

- Text editors

- notepad ++

- vim

- sublime

- e.t.c

Though there are many and the tool of choice is open for use, for now we can focus on using vsCode as this is universal to also other programming languages but if you are interested in a polygot system you can test positron.

### Installations

***Let us check what installations you have***

- Python Installation

- IDE Installation

- To install all this together we prefer [anaconda](www.anaconda.com)

- Anaconda | **full suit of packages and tools**

- miniconda | **minimal and necessary packages**

:::

## Python | Hello World

Python being an interpreted high level programming language, making things easier for the programmer. Able to pick up the things very easily.

Let us start with the first code.

::: {.panel-tabset}

### Hello World

```{python}

#| echo: True

print("Hello WOrld")

```

Let us use it as a calculator.

```{python}

#| echo: true

#| eval: false

#| code-line-numbers: "|1|3|5|7"

2 + 2 #Add

3 - 1 #Subtract

4 * 5 #Multiply

#20

20 / 5 #Divide

5 ** 2 #Exponent

#25

5 % 2 #?What is the result

```

Using it as an ***input***

```{python}

#| echo: true

#| eval: false

input("What's your name")

```

Using ***comments***

```{python}

#| echo: true

#This is a comment

```

Assigning objects to names **variables**

```{python}

#| echo: true

department = "DT"

print(department)

# you can change the variable on the fly

department = 'Customs'

print(department)

```

Rules for variable names

* Can't start with numbers (1,2,3,4....)

* Letter, numbers, underscores are allowed in the name but ',-, spaces are not allowed

### Keywords

**python** has *keywords* this are words that have syntactical use in the program below list even though not fully conclusive.

`and` `continue` `except` `global` `lambda` `raise` `yield`

`as` `def` `exec` `if` `not` `return`

`assert` `del` `finally` `import` `or` `try`

`break` `elif` `for` `in` `pass` `while`

`class` `else` `from` `is` `print` `with`

### Data types

**Numbers**

Integers, floating point numbers and complex numbers falls under Python numbers category.

> We can use the `type()` function to know which class a variable or a value belongs to and the `isinstance()` function to check if an object belongs to a particular class.

```{python}

#| echo: true

a = 5

print(a, "is of type", type(a))

a = 2.0

print(a, "is of type", type(a))

a = 1+2j

print(a, "is of type", type(a))

print(a, "is complex number?", isinstance(1+2j,complex))

```

**Strings**

sequence of characters used to store and represent text-based information

```{python}

#| echo: true

first_string = "My first String"

first_string

```

```{python}

#| echo: true

#| eval: false

long_string = """Very long string

spanning multiple lines

that never seem to come to an end"""

long_string

```

```{python}

#| echo: true

#| eval: false

print(first_string.capitalize())

print(first_string.title())

print(first_string.upper())

print(first_string.swapcase())

print(first_string.find('is'))

print(first_string.replace('first', 'second'))

print(first_string.strip())

print(first_string.isalnum())

print(first_string.isalpha())

print(first_string.isdigit())

print(first_string.isprintable())

```

***List***

Mutable ordered sequence of items.

```{python}

#| echo: true

first_list = [1,2,3,4,1,1,1,1]

first_list

```

List objects provide several methods

```{python}

#| echo: true

#| eval: false

first_list.count(1)

first_list.index(1)

first_list.append(5)

first_list.remove(5)

first_list.pop(-1)

first_list.reverse()

first_list.sort()

```

***Tuples***

Immutable ordered sequence of items.Tuples once created cannot be modified.

```{python}

#| echo: true

first_tuple = (1,2,3)

first_tuple

```

***Sets***

Ordered collections of unique items.

```{python}

#| echo: true

#| #| eval: false

{42, 3.14, 'hello'} # Literal for a set with three items

{100} # Literal for a set with one item

set() # Empty set (can't use {}—empty dict!)

```

***Dictionary***

Arbitrary collection of objects indexed by nearly arbitrary values called keys.

```{python}

#| echo: true

first_dic = {'a' : [1,2,3], 'b' : [4,5,6], 'c' : [7,6,8]}

first_dic

```

```{python}

#| echo: true

#| eval: false

{'x':42, 'y':3.14, 'z':7} # Dictionary with three items, str keys

{1:2, 3:4} # Dictionary with two items, int keys

{1:'za', 'br':23} # Dictionary with mixed key types

{} # Empty dictionary

dict(x=42, y=3.14, z=7) # Dictionary with three items, str keys

dict([(1, 2), (3, 4)]) # Dictionary with two items, int keys

dict([(1,'za'), ('br',23)]) # Dictionary with mixed key types

dict() # Empty dictionary

```

### Methods & Functions

**Methods**

Method: Attributes associated to different objects and data types. As well classes at a broader level

```{python}

#| echo: true

first_string.upper()

first_string.lower()

first_string.swapcase()

first_string.rsplit() #separating or delimiter is a space

```

***Functions***

```{python}

#| echo: true

first_list = [1,2,3,4]

first_list

```

### Summary Sheets

**associativity** of the operator: L (left-to-right), R (right-to-left), or NA (nonassociative).

|**Operator**|**Description**|**Associativity**|

|----|-----|------|

|`{key:expr,...}` |Dictionary creation| NA|

|`{ expr ,...}` |Set creation | NA|

|`[ expr ,...]` |List creation | NA|

|`( expr ,...)` |Tuple creation or just parentheses | NA|

|`f ( expr ,...)` |Function call | L|

|`x [ index : index ]` |Slicing | L|

|`x [ index ]` |Indexing | L|

|`x . attr` |Attribute reference | L|

|`x ** y` |Exponentiation (x to the yth power) | R|

|`~ x` |Bitwise NOT | NA|

|`+x, -x` |Unary plus and minus | NA|

|`x*y, x/y, x//y, x%y` |Multiplication, division, truncating division,remainder | L|

|`x+y, x-y` | Addition, subtraction | L |

|`x<<y, x>>y` | Left-shift, right-shift | L |

|`x & y` | Bitwise AND | L |

|`x ^ y` | Bitwise XOR | L |

|`x | y` | Bitwise OR | L |

|`x<y, x<=y, x>y, x>=y, x<>y (v2 only),x!=y, x==y` | Comparisons (less than, less than or equal, greater than, greater than or equal, inequality, equality)a | NA |

|`x is y, x is not y` | Identity tests | NA |

|`x in y, x not in y` | Membership tests | NA |

|`not x` | Boolean NOT | NA |

|`x and y` | Boolean AND | L |

|`x or y` | Boolean OR | L |

|`x if expr else y` | Ternary operator | NA |

|`lambda arg,...: expr` | Anonymous simple function | NA |

:::

## Python | Further into Hello World

::: {.panel-tabset}

### Conditions & Iterations

Conditions criteria where we compare values and decide what step to take.

Example of conditional criteria `if-else` , `if-elif-else`, `while`.

```{python}

#| echo: true

#| eval: false

if condition:

#do something

else:

#do something

```

Comparison operators go hand in hand with conditions.

Comparison operators inculde `==` , `<=`, `>=`, `|`, `&`, `or`, `and` e.t.c

Iterations is repeating and the most common form of iteration is `for`

Code highligt for `for`

```{python}

#| echo: true

#| eval: false

for value in a_list:

#do something

```

:::

## Python modules

## Modules introduction

When our program grows bigger, it is a good idea to break it into different modules.

A module is a file containing Python definitions and statements. Python modules have a filename and end with the extension .py.

Definitions inside a module can be imported to another module or the interactive interpreter in Python. We use the import keyword to do this.

For example, we can import the math module by typing in import math.

```{python}

#| echo: true

import math

print(math.pi)

```

Checking paths using **sys** module

```{python}

#| echo: true

import sys

print(sys.path)

```

## Python numpy | Introduction {.smaller}

***numpy***: python module/library specialized in *Arrays and Vectorized Computation*.

NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python. Numpy works with array from 1-n dimensional.

::: {.panel-tabset}

### Intro

```{python}

#| echo: true

import numpy as np

my_arr = np.arange(10)

#my_list = list(range(10)) #inbuilt python

my_arr

```

Multidimensional Array

```{python}

#| echo: true

data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])

data

a = np.array([1,2,3])

b = np.array([(1.5,2,3), (4,5,6)], dtype = float)

c = np.array([[(1.5,2,3), (4,5,6)],[(3,2,1), (4,5,6)]], dtype = float)

np.zeros((3,4)) #Create an array of zeros

np.ones((2,3,4),dtype=np.int16) #Create an array of ones

d = np.arange(10,25,5)#Create an array of evenly spaced values (step value)

np.linspace(0,2,9) #Create an array of evenlyspaced values (number of samples)

e = np.full((2,2),7)#Create a constant array

f = np.eye(2) #Create a 2X2 identity matrix

np.random.random((2,2)) #Create an array with random values

np.empty((3,2)) #Create an empty array

```

### Array descriptors & inspectors

```{python}

#| echo: true

#| eval: false

data.shape #Array dimensions

len(a)#Length of array

b.ndim #Number of array dimensions

e.size #Number of array elements

b.dtype #Data type of array elements

b.dtype.name #Name of data type

b.astype(int). #Convert an array to a different type

```

### Array Arithmetics

```{python}

#| echo: true

#| eval: false

g = a - b. #Subtraction

np.subtract(a,b) #Subtraction

b + a #Addition

np.add(b,a) #Addition

a/b #Division

np.divide(a,b) #Division

a * b #Multiplication

np.multiply(a,b) #Multiplication

np.exp(b) #Exponentiation

np.sqrt(b) #Square root

np.sin(a) #Print sines of an array

np.cos(b) #Elementwise cosine

np.log(a)#Elementwise natural logarithm

e.dot(f) #Dot product

```

**Comparison**

```{python}

#| echo: true

#| eval: false

a == b #Elementwise comparison

a< 2 #Elementwise comparison

np.array_equal(a, b) #Arraywise comparison

```

**Sorting Arrays**

```{python}

#| echo: true

#| eval: false

a.sort() #Sort an array

c.sort(axis=0) #Sort the elements of an array's axis

```

**Subsetting, slicing, indexing

```{python}

#| echo: true

#| eval: false

a[2] #Select the element at the 2nd index

b[1,2] #Select the element at row 1 column 2(equivalent to b[1][2])

a[0:2]#Select items at index 0 and 1

b[0:2,1] #Select items at rows 0 and 1 in column 1

b[:1] #Select all items at row0(equivalent to b[0:1, :])

c[1,...] #Same as[1,:,:]

a[ : : -1] #Reversed array a array([3, 2, 1])

a[a<2] #Select elements from a less than 2

b[[1,0,1, 0],[0,1, 2, 0]] #Select elements(1,0),(0,1),(1,2) and(0,0)

b[[1,0,1, 0]][:,[0,1,2,0]] #Select a subset of the matrix’s rows and columns

```

### Array Manipulation

```{python}

#| echo: true

#| eval: false

i = np.transpose(b) #Permute array dimensions

i.T #Permute array dimensions

b.ravel() #Flatten the array

g.reshape(3, -2) #Reshape, but don’t change data

h.resize((2,6)) #Return a new arraywith shape(2,6)

np.append(h,g) #Append items to an array

np.insert(a,1,5) #Insert items in an array

np.delete(a,[1]) #Delete items from an array

np.concatenate((a,d),axis=0) #Concatenate arrays

np.vstack((a,b) #Stack arrays vertically(row wise)

np.r_[e,f] #Stack arrays vertically(row wise)

np.hstack((e,f)) #Stack arrays horizontally(column wise)

np.column_stack((a,d)) #Create stacked column wise arrays

np.c_[a,d] #Create stacked column wise arrays

np.hsplit(a,3) #Split the array horizontally at the 3rd index

np.vsplit(c,2) #Split the array vertically at the 2nd index

```

:::

## Python Pandas | Introduction

***Pandas*** : Python module/library enhancing data manipulation tools designed to make data cleaning and analysis fast and convenient in Python. Works intandem with *numpy* and is the core working under the hood.

::: {.panel-tabset}

### Intro

```{python}

#| echo: true

import pandas as pd

obj_series = pd.Series([4, 7, -5, 3])

obj_series

```

Main difference of pandas with numpy is that it has indexed values and designed for working with tabular or heterogeneous data.

Pandas relies on dataframes this is excel like data format with rows/records and columns/fields. Which mankes it easy to work with.

Each row stands for an observation and columns here are variables.

```{python}

#| echo: true

data = {

'county' : ['Nairobi','Kiambu','Kajiado','Machakos'],

'headquarters' : ['Nairobi','Kiambu','Kajiado','Machakos'],

'population' : [4397073,2417735,1117840,1421932]

}

df_data = pd.DataFrame(data,columns=['county','headquarters','population'])

df_data

```

Checking and investigating the dataframe

```{python}

#| echo: true

#| eval: false

df_data.shape()

df_data.index()

df_data.columns()

df_data.info()

df_data.count()

```

### Pandas Summary

```{python}

#| echo: true

#| eval: false

df_data.sum()

df_data.cumsum()

df_data.min()

df_data.max()

df_data.idmax()

df_data.idmin()

df_data.describe()

df_data.mean()

df_data.media()

```

### Pandas Import/Export [Read/Write]

**Read CSV**

```{python}

#| echo: true

#| eval: false

#import pandas as pd

df = pd.read_csv('file.csv', header = None, nrows=5)

df.to_csv("first_dataframe.csv")

```

**Read excel**

```{python}

#| echo: true

#| eval: false

df = pd.read_excel('excel_file.xlsx', sheet = 'Sheet1')

df_mulitple_excel = pd.ExcelFile('excel_file.xlsx')

df = pd.read_excel(df_mulitple_excel, 'Sheet1')

df.to_excel('first_dataframe.xlsx', sheet_name = 'Sheet first')

```

|Function| Description

|-------|--------

|read_csv | Load delimited data from a file, URL, or file-like object; use comma as default delimiter

|read_fwf | Read data in fixed-width column format (i.e., no delimiters)

|read_clipboard| Variation of `read_csv` that reads data from the clipboard; useful for converting tables from web pages

|read_excel| Read tabular data from an Excel XLS or XLSX file

|read_hdf| Read HDF5 files written by pandas

|read_html| Read all tables found in the given HTML document

|read_json| Read data from a JSON (JavaScript Object Notation) string representation, file, URL, or file-like object

|read_feather| Read the Feather binary file format

|read_orc| Read the Apache ORC binary file format

|read_parquet| Read the Apache Parquet binary file format

|read_pickle| Read an object stored by pandas using the Python pickle format

|read_sas| Read a SAS dataset stored in one of the SAS system's custom storage formats

|read_spss| Read a data file created by SPSS

|read_sql| Read the results of a SQL query (using SQLAlchemy)

|read_sql_table| Read a whole SQL table (using SQLAlchemy); equivalent to using a query that selects everything in that table using `read_sql`

|read_stata| Read a dataset from Stata file format

|read_xml| Read a table of data from an XML file

### Pandas Data Cleaning

> 80% of the work done on data is cleaning

#### Dealing with missing data

|Method | Description

|------|---------

|dropna | Filter axis labels based on whether values for each label have missing data, with varying thresholds for how much missing data to tolerate.

|fillna | Fill in missing data with some value or using an interpolation method such as "ffill" or "bfill".

|isna | Return Boolean values indicating which values are missing/NA.

|notna | Negation of `isna`, returns True for non-NA values and False for NA values.

#### Data Transformation

***Removing Duplicates***

```{python}

#| echo: true

#| eval: false

data = pd.DataFrame({"k1": ["one", "two"] * 3 + ["two"],

"k2": [1, 1, 2, 3, 3, 4, 4]})

data

data.duplicated()

data.drop_duplicates()

data["v1"] = range(7)

data

data.drop_duplicates(subset=["k1"])

data.drop_duplicates(["k1", "k2"], keep="last")

data.sort_values(by = 'k1') #sort

```

**Sample Cleaning**

```{python}

#| echo: true

#| eval: false

# Reading data using pandas

df = pd.read_csv("https://rcs.bu.edu/examples/python/DataAnalysis/Salaries.csv")

# List first 5 records

df.head()

#Select column

df['sex']

df.sex

# #Group data using rank

df_rank = df.groupby(["rank"])

df_rank.head()

# #Calculate mean value for each numeric column per each group

df_rank.mean()

# Once groupby object is created we can calculate various statistics for each group:

#Calculate mean salary for each professor rank:

df.groupby('rank')[['salary']].mean()

# Note: If single brackets are used to specify the column (e.g. salary), then the output is Pandas Series object.

# When double brackets are used the output is a Data Frame

#Calculate mean salary for each professor rank:

df.groupby(['rank'], sort=False)[['salary']].mean()

# subset the rows in which the salary value is greater than $120K:

df_sub = df[df['salary'] > 120000]

df_sub.head()

#Select only those rows that contain female professors:

df_f = df[df['sex'] == 'Female']

#Selecting rows

df[0:10]

#Select rows by their labels:

df_sub.loc[10:20,['rank','sex','salary']]

#Select rows by their labels:

df_sub.iloc[10:20,[0, 3, 4, 5]]

#We can sort the data using 2 or more columns:

df_sorted = df.sort_values( by =['service', 'salary'], ascending = [True, False])

df_sorted.head(10)

```

**Sample 2***

```{python}

#| echo: true

#| eval: false

# Read a dataset with missing values

flights = pd.read_csv("https://rcs.bu.edu/examples/python/DataAnalysis/flights.csv")

# Select the rows that have at least one missing value

flights[flights.isnull().any(axis=1)].head()

#There are a number of methods to deal with missing values in the data frame:

#df.dropna(): drop missing observations

#df.dropna(how = "all"): drop observtions where all cells is NA

#df.dropna(axis = 1,how = "all"): drop column if all the values are missing

#df.dropna(thresh = 5): Drop rows that contain less than 5 non-missing values

#df.fillna(0): Replace missing values with zeros

#df.isnull(): returns True if the value is missing

#df.notnull(): Returns True for non-missing values

```

View remainder of file in raw view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

MsaPythonDataAnalytics.qmd

Latest commit

History

MsaPythonDataAnalytics.qmd

File metadata and controls