python Archives - Sparrow Computing

Basic Counting in Python

Ben Cook — Fri, 14 May 2021 20:11:16 +0000

I love fancy machine learning algorithms as much as anyone. But sometimes, you just need to count things. And Python’s built-in data structures make this really easy. Let’s say we have a list of strings:

things = [
    "a",
    "a", "b",
    "a", "b", "c",
    "a", "b", "c", "d",
]

With a list like this, you might care about a few different counts. What’s the count of all items? What’s the count of unique items? How many instances are there of ? How many instances are there of all unique values?

We can answer these questions easily and efficiently with lists, sets and dictionaries. Being very comfortable with these objects is important for writing good Python code. With that said, let’s find all our counts.

Count all values in a list

We’ll start with an easy one:

len(things)

# Expected result
# 10

The len() function works for built-in Python data structures, but it also works with any class that implements the __len__() method. For example, calling len() on a NumPy array returns the size of the first dimension.

Count unique values in a list

How many unique values are there in a list? Answer this question by first creating a unique collection of values (that is, a set). Then call len() on the set:

len(set(things))

# Expected result
# 4

One thing to point out here is that things doesn’t have to be a list of strings for this to work. In Python, you can put any hashable object into a set. By default, this includes simple data types, but you can implement the __eq__() and __hash__() methods that handle object equality and object hashes (respectively) in order to make any object hashable.

Count instances of a specific value

How many instances of "a" are there in the list? You can find out with the .count() method:

things.count("a")

# Expected result
# 4

Convenient!

Count instances of all unique values

OK, but what if we want to count the number of instances of all unique values? If you use Pandas or SQL, you will probably recognize this as a group by operation. Indeed, Python comes with a itertools.groupby() function that does exactly this. But it’s a bit of a pain because you have to sort your list before passing it in. And if you forget to sort your list, you don’t get an error, you just get the wrong result.

Instead, let’s go back to our trusty friend the set. If we loop through all the unique values (the set of values) then we can call the .count() method with each one. That will tell us what we need to know:

for value in set(things):
    print(value, things.count(value))

# Expected result
# a 4
# c 2
# b 3
# d 1

This is easy and efficient.

One other cool trick

One other thing to mention is that if you want to know all of these counts for a list, you should consider creating a dictionary of value counts first. You can use a collections.defaultdict for this, but you can also create it in a one-liner with dictionary comprehension:

counts = {value: things.count(value) for value in things}

counts

# Expected result
# {'a': 4, 'b': 3, 'c': 2, 'd': 1}

Now we have the count of all unique values. But you can also get all the other counts that we discussed above:

# Count all values in the list
sum(counts.values())

# Expected result
# 10

# Count unique values in the list
len(counts.keys())

# Expected result
# 4

# Count instances of a specific value
counts["a"]

# Expected result
# 4

The post Basic Counting in Python appeared first on Sparrow Computing.

Installing Packages in a Jupyter Notebook

Ben Cook — Fri, 22 Jan 2021 13:28:00 +0000

Here’s a really quick recipe that I use when I’m writing scratch code. If you have a Jupyter notebook running and you want to install or upgrade a package for that environment, you can run the following in a cell:

import sys

!$sys.executable -m pip install

For example, the following would upgrade seaborn to the latest version in the current environment:

import sys

!$sys.executable -m pip install --upgrade seaborn

Why does it work? Well, you can call pip with the python executable by using the -m flag:

python -m pip install ...

And in a Python session, the sys.executable attribute is a string with the full path to the executable for the current environment, e.g.:

sys.executable

# Exected result like...
# /Users/myname/anaconda3/envs/myenv/bin/python

Additionally, we use ! to escape shell commands in a notebook and $ to insert Python variables from the current session into that shell command.

After you run the command, you might need to restart the notebook for the new package to be available.

Finally, a quick word of warning: you should usually avoid this pattern! Whenever possible, you want your Python code to be in a reproducible environment. Use something like Poetry for that, or at least save dependencies in requirements.txt.

Still, notebooks can be useful for scratch code and sometimes reproducibility is not a major concern. In those scenarios, you want to get your environment working as quickly as possible. For this kind of use case, try this trick!

The post Installing Packages in a Jupyter Notebook appeared first on Sparrow Computing.

Scientific Notation in Python and NumPy

Ben Cook — Mon, 04 Jan 2021 22:00:00 +0000

Python can deal with floating point numbers in both scientific and standard notation. This post will explains how it works in Python and NumPy. If you just want to suppress scientific notation in NumPy, jump to this section.

You can create numbers with scientific notation in Python with e:

print(3.45e-4)

# Expected result
# 0.000345

Notice: the number is printed in standard notation even though we defined it with scientific notation. This is because Python decides whether to display numbers in scientific notation based on what number it is. As of Python 3, for numbers less than 1e-4 or greater than 1e16, Python will use scientific notation. Otherwise, it uses standard notation.

But you can override this behavior with string formatting. Use :.e in your string formatting to display a number in scientific notation:

x = 3.45e-4
print(f"{x:.2e}")

# Expected result
# 3.45e-04

To suppress scientific notation, use :.f:

print(f"{x:.6f}")

# Expected result
# 0.000345

With slight modifications, you can also use the format() or % string formatting approaches:

print("{:.4e}".format(x))

# Expected result
# 3.4500e-04

print("%.7f" % x)

# Expected result
# 0.0003450

NumPy

Finally, in NumPy, you can suppress scientific notation with the np.set_printoptions() function:

import numpy as np

np.set_printoptions(suppress=True)
np.arange(5) / 100000

# Expected result
array([0.     , 0.00001, 0.00002, 0.00003, 0.00004])

You can read more in the docs if you want to change other characteristics of NumPy’s array printing.

The post Scientific Notation in Python and NumPy appeared first on Sparrow Computing.

Sorting a List of Tuples in Python

Ben Cook — Thu, 31 Dec 2020 20:50:00 +0000

Let’s say you have a list of tuples in Python:

tuple_list = [(1, 1, 5), (0, 1, 3), (0, 2, 4), (1, 3, 4)]

You can sort this list with the built-in sorted() function:

sorted(tuple_list)

# Expected result
# [(0, 1, 3), (0, 2, 4), (1, 1, 5), (1, 3, 4)]

The output of this call is the same list with tuples sorted by the first element with ties broken by subsequent elements in the tuples. This works as-is because Python has a way to compare two tuples: it simply evaluates each element one at a time until it finds a difference. This means that the expression (0, 1, 3) > (0, 1, 2) evaluates to true.

You can also sort the list in-place:

# This returns None, but tuple_list will now be sorted
tuple_list.sort()

sorted() and sort() both accept key and reverse keyword arguments that can be used to modify the default behavior.

key

The key argument lets you specify a custom sort order. It accepts a function that takes an element and returns the value you want to compare. So if you want to sort by the second element you can do the following:

from operator import itemgetter

sorted(tuple_list, key=itemgetter(1))

# Expected result
# [(0, 1, 3), (1, 1, 5), (0, 2, 4), (1, 3, 4)]

You will often see people use something like lambda x: x[1] instead of itemgetter(). This approach is good to be aware of because you can use it if the items in your list are arbitrary objects. But itemgetter() is a little faster when you just need to access indices. Another cool thing about itemgetter() is that you can pass in multiple arguments:

sorted(tuple_list, key=itemgetter(2, 1))

# Expected result
# [(0, 1, 3), (0, 2, 4), (1, 3, 4), (1, 1, 5)]

This will sort the list by the third element, using the second element to break ties. You can prove it to yourself by calling the function on a tuple directly:

itemgetter(2, 1)((0, 2, 4))

# Expected result
# (4, 2)

reverse

You can also sort your list in descending order by passing in reverse=True:

sorted(tuple_list, reverse=True)

# Expected result
# [(1, 3, 4), (1, 1, 5), (0, 2, 4), (0, 1, 3)]

Finally, say you want to sort by:

The second element, ascending
The third element, descending

You can accomplish this by sorting twice in reverse priority order:

sorted(sorted(tuple_list, key=itemgetter(2), reverse=True), key=itemgetter(1))

# Expected result
# [(1, 1, 5), (0, 1, 3), (0, 2, 4), (1, 3, 4)]

This works because TimSort (which Python uses under the hood) is stable, meaning the order of the non-sorting indices won’t be changed if you don’t explicitly sort on them.

The post Sorting a List of Tuples in Python appeared first on Sparrow Computing.

Combinations in Python

Ben Cook — Mon, 28 Dec 2020 19:23:00 +0000

If you want to see how to create combinations without itertools in Python, jump to this section.

Combinations

A combination is a selection of elements from a set such that order doesn’t matter. Say we have a list [1, 2, 3], the 2-combinations of this set are [(1, 2), (1, 3), (2, 3)]. Notice that order doesn’t matter. Once we have (1, 2) in the set, we don’t also get (2, 1). By default, combinations are typically defined to be without replacement. This means that we’ll never see (1, 1) – once the 1 has been drawn it is not replaced.

You can also have combinations with replacement. The 2-combinations (with replacement) of the list [1, 2, 3] are [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]. In this case, numbers are replaced after they’re drawn.

There’s one important note before we jump into implementations of this operation in Python. The combinations API from itertools treats list index as the element being drawn. This means any iterable can be treated like a set (since all indices are unique). But it’s important to realize that if you pass in [1, 1, 2], the elements will not be de-duped for you. The 2-combinations of [1, 1, 2] according to the itertools combinations API is [(1, 1), (1, 2), (1, 2)].

Approaches

Combinations in itertools

It’s extremely easy to generate combinations in Python with itertools. The following generates all 2-combinations of the list [1, 2, 3]:

import itertools

sequence = [1, 2, 3]
itertools.combinations(sequence, 2)

# Expected result
#

The combinations() function returns an iterator. This is what you want if you plan to loop through the combinations. But you can convert it into a list if you want all the combinations in memory:

list(itertools.combinations(sequence, 2))

# Expected result
# [(1, 2), (1, 3), (2, 3)]

A useful property of the combinations() function is that it takes any iterable as the first argument. This means you can pass lazy sequences[1] in:

list(itertools.combinations(range(3), 2))

# Expected result
# [(0, 1), (0, 2), (1, 2)]

Combinations with replacement in itertools

It’s also very easy to generate combinations with replacement:

list(itertools.combinations_with_replacement(sequence, 2))

# Expected result
# [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]

The interface for combinations_with_replacement() is the same as combinations().

Combinations without itertools

Once in a while, you might want to generate combinations without using itertools. Maybe you want to change the API slightly — say, returning a list instead of an iterator, or you might want to operate on a NumPy array.

Under the hood, Python uses a C implementation of the combinations algorithm. But the documentation provides a helpful Python implementation you can use, reproduced here for convenience:

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

Combinations without replacement (and without itertools)

The Python docs also give us a Python-only implementation of combinations_with_replacement():

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

Notes

[1]: Technically, range() does not return an iterator.

The post Combinations in Python appeared first on Sparrow Computing.

Upgrading pip on macOS

Ben Cook — Sun, 27 Dec 2020 19:21:00 +0000

There are many different ways to install Python (and therefore pip) on macOS. If you’re struggling with this, here are a few things to try.

If pip is already installed:

pip install --upgrade pip

If you get an error about the pip command not being found, the easiest thing to do is use your Python interpreter:

python -m pip install --upgrade pip

Some installations will also install an alias called pip3:

pip3 install --upgrade pip

If you already have a version of Python installed that does not have pip, you can manually install pip:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

Another option is to update your Python installation. New installations of Python usually come with pip out of the box and there are a few ways to do this on Mac.

Installing Python With Homebrew:

brew install python

You can also download the installer from python.org.

Or download the installer from Anaconda. This is what I recommend for scientific computing.

The post Upgrading pip on macOS appeared first on Sparrow Computing.

How to Force pip to Reinstall a Package

Ben Cook — Sat, 26 Dec 2020 19:17:00 +0000

Once in a while, a Python package gets corrupted on your machine and you need to force pip to reinstall it. As of pip 10.0, you can run the following:

pip install --force-reinstall

This will force pip to re-install and all its dependencies.

If you want to re-download the packages instead of using the files from your pip cache, add the --no-cache-dir flag:

pip install --force-reinstall --no-cache-dir

If you want to upgrade the package, you can run this instead:

pip install --upgrade

The --upgrade flag will not mess with the dependencies of unless you add the --force-reinstall flag.

If, for some reason, you want to re-install and all its dependencies without first removing the current versions, you can run:

pip install --ignore-installed

By the way, if you’re using a pip version that is less than 10.0, it’s time to update pip:

pip install --upgrade pip

The post How to Force pip to Reinstall a Package appeared first on Sparrow Computing.

Uninstall TensorFlow: The Unofficial Troubleshooting Guide

Ben Cook — Wed, 23 Dec 2020 15:58:00 +0000

There are lots of ways to install TensorFlow, which means (unfortunately) there is no one-size-fits-all solution for uninstalling it. The internet is littered with questions from frustrated developers and data scientists trying to remove this behemoth from their machines.

This post enumerates the solutions I’ve seen.

Uninstall TensorFlow

If you installed TensorFlow with pip

In most cases, you should be able to run: pip uninstall tensorflow
Or, if you installed tensorflow-gpu: pip uninstall tensorflow-gpu
Once in a while, the pip in your current path is not the same pip you used to install it. In this case, find the Python environment where the TensorFlow install lives and run: /path/to/python -m pip uninstall tensorflow

If you installed TensorFlow with conda

If you want to reuse your conda environment, you can run: conda remove tensorflow
If you’re willing to start a new conda environment, just remove the current one: conda remove --name --all

If you built TensorFlow from source

Go to the source directory and run: python setup.py develop --uninstall

If you’re on Windows and you’re still having trouble

Check out this thread on GitHub

Finally, if one of the solutions above worked, you may also want to remove the packages TensorFlow installs automatically.

Alternatives

On the other hand, you may not actually need to uninstall TensorFlow. Here are a few alternatives to consider.

If you want to use TensorFlow in your current environment

You can upgrade the package: pip install tensorflow --upgrade

You can setup a virtual environment

You probably want to use Anaconda: conda create --name tensorflow-env python=3.8 pip tensorflow
But you can also use virtualenv. This requires a few commands:
1. python -m venv tensorflow-venv
2. source tensorflow-env/bin/activate
3. pip install tensorflow

You can use Docker

The TensorFlow docs have a good page on using Docker for TensorFlow. Ultimately, this is what I would recommend if you can make it work. Getting TensorFlow off your actual machine and into Docker will save you headaches down the road.

What am I missing?

I hope this helps. Did I miss something obvious? Let me know in the comments section.

The post Uninstall TensorFlow: The Unofficial Troubleshooting Guide appeared first on Sparrow Computing.

Download a YouTube Video from the Command Line with youtube-dl

Ben Cook — Sat, 15 Feb 2020 21:57:00 +0000

To download a YouTube video from the command line, use the Python package youtube-dl:

pip install youtube-dl
youtube-dl

If you want to specify the name of the downloaded file, use the -o flag.

For example, to download this road traffic video for object tracking to a file called traffic.mkv:

youtube-dl https://www.youtube.com/watch?v=MNn9qKG2UFI -o traffic

When the process finishes you will have the full video on your computer!

The post Download a YouTube Video from the Command Line with youtube-dl appeared first on Sparrow Computing.

Object Spread Operator for Python

Ben Cook — Sun, 17 Mar 2019 21:49:00 +0000

Say you have a dictionary that you want to both copy and update. In JavaScript, this is a common pattern that gets its own syntax, called the object spread operator:

const oldObject = { hello: 'world', foo: 'bar' }
const newObject = { ...oldObject, foo: 'baz' }

After running this snippet, newObject will be { hello: 'world', foo: 'baz' }. Turns out, you can also do this in Python since version 3.5:

old_dict = {'hello': 'world', 'foo': 'bar'}
new_dict = {**old_dict, 'foo': 'baz'}

new_dict

# Expected result
# {'hello': 'world', 'foo': 'baz'}

You can refer to the double asterisk ** as “dictionary unpacking”. You sometimes see it for passing extra keyword arguments into a function.

Spread operator for lists

In JavaScript, you can also use spread operators in arrays to make updated copies:

const oldArray = [1, 2, 3]
const newArray = [...oldArray, 4, 5]

This would make newArray an updated copy with [ 1, 2, 3, 4, 5 ].

We can replicate the behavior for lists in Python:

old_list = [1, 2, 3]
new_list = [*old_list, 4, 5]

new_list

# Expected result
# [1, 2, 3, 4, 5]

This is somewhat less useful since you can also write old_list + [4, 5], which doesn’t exist in JavaScript. But the spread operator approach is still a cool trick to know and being comfortable with basic data structures in Python is important! You can refer to the single asterisk * as “iterable unpacking” or “splat”.

The post Object Spread Operator for Python appeared first on Sparrow Computing.