forked from yasoob/intermediatePython
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathopen_function.html
More file actions
137 lines (124 loc) · 10.3 KB
/
open_function.html
File metadata and controls
137 lines (124 loc) · 10.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Open function</title>
<link rel="stylesheet" href="_static/epub.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
</head>
<body role="document">
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="targeting_python_2_3.html" title="Targeting Python 2+3"
accesskey="N">next</a></li>
<li class="right" >
<a href="for_-_else.html" title="For - Else"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Python Tips 0.1 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="body" role="main">
<div class="section" id="open-function">
<h1>Open function</h1>
<p><a class="reference external" href="http://docs.python.org/dev/library/functions.html#open">open</a><span class="link-target"> [http://docs.python.org/dev/library/functions.html#open]</span> opens
a file. Pretty simple, eh? Most of the time, we see it being used like
this:</p>
<div class="code python highlight-python"><div class="highlight"><pre><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'photo.jpg'</span><span class="p">,</span> <span class="s">'r+'</span><span class="p">)</span>
<span class="n">jpgdata</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">f</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</pre></div>
</div>
<p>The reason I am writing this article is that most of the time, I see
open used like this. There are <strong>three</strong> errors in the above code. Can
you spot them all? If not, read on. By the end of this article, you’ll
know what’s wrong in the above code, and, more importantly, be able to
avoid these mistakes in your own code. Let’s start with the basics:</p>
<p>The return of open is a file handle, given out from the operating system
to your Python application. You will want to return this file handle
once you’re finished with the file, if only so that your application
won’t reach the limit of the number of open file handle it can have at
once.</p>
<p>Explicitly calling <code class="docutils literal"><span class="pre">close</span></code> closes the file handle, but only if the
read was successful. If there is any error just after <code class="docutils literal"><span class="pre">f</span> <span class="pre">=</span> <span class="pre">open(...)</span></code>,
<code class="docutils literal"><span class="pre">f.close()</span></code> will not be called (depending on the Python interpreter,
the file handle may still be returned, but that’s another story). To
make sure that the file gets closed whether an exception occurs or not,
pack it into a
<code class="docutils literal"><span class="pre">`with</span></code> <<a class="reference external" href="http://freepythontips.wordpress.com/2013/07/28/the-with-statement/">http://freepythontips.wordpress.com/2013/07/28/the-with-statement/</a>>`__
statement:</p>
<div class="code python highlight-python"><div class="highlight"><pre><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'photo.jpg'</span><span class="p">,</span> <span class="s">'r+'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">jpgdata</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</pre></div>
</div>
<p>The first argument of <code class="docutils literal"><span class="pre">open</span></code> is the filename. The second one (the
<em>mode</em>) determines <em>how</em> the file gets opened.</p>
<ul class="simple">
<li>If you want to read the file, pass in <code class="docutils literal"><span class="pre">r</span></code></li>
<li>If you want to read and write the file, pass in <code class="docutils literal"><span class="pre">r+</span></code></li>
<li>If you want to overwrite the file, pass in <code class="docutils literal"><span class="pre">w</span></code></li>
<li>If you want to append to the file, pass in <code class="docutils literal"><span class="pre">a</span></code></li>
</ul>
<p>While there are a couple of other valid mode strings, chances are you
won’t ever use them. The mode matters not only because it changes the
behavior, but also because it may result in permission errors. For
example, if we were to open a jpg-file in a write-protected directory,
<code class="docutils literal"><span class="pre">open(..,</span> <span class="pre">'r+')</span></code> would fail. The mode can contain one further
character; we can open the file in binary (you’ll get a string of bytes)
or text mode (a string of characters).</p>
<p>In general, if the format is written by humans, it tends to be text
mode. <code class="docutils literal"><span class="pre">jpg</span></code> image files are not generally written by humans (and are
indeed not readable to humans), and you should therefore open them in
binary mode by adding a <code class="docutils literal"><span class="pre">b</span></code> to the text string (if you’re following
the opening example, the correct mode would be <code class="docutils literal"><span class="pre">rb</span></code>). If you open
something in text mode (i.e. add a <code class="docutils literal"><span class="pre">t</span></code>, or nothing apart from
<code class="docutils literal"><span class="pre">r/r+/w/a</span></code>), you must also know which encoding to use - for a
computer, all files are just bytes, not characters.</p>
<p>Unfortunately, <code class="docutils literal"><span class="pre">open</span></code> does not allow explicit encoding specification
in Python 2.x. However, the function
<code class="docutils literal"><span class="pre">`io.open</span></code> <<a class="reference external" href="http://docs.python.org/2/library/io.html#io.open">http://docs.python.org/2/library/io.html#io.open</a>>`__ is
available in both Python 2.x and 3.x (where it is an alias of <code class="docutils literal"><span class="pre">open</span></code>),
and does the right thing. You can pass in the encoding with the
<code class="docutils literal"><span class="pre">encoding</span></code> keyword. If you don’t pass in any encoding, a system- (and
Python-) specific default will be picked. You may be tempted to rely on
these defaults, but the defaults are often wrong, or the default
encoding cannot actually express all characters (this will happen on
Python 2.x and/or Windows). So go ahead and pick an encoding. <code class="docutils literal"><span class="pre">utf-8</span></code>
is a terrific one. When you write a file, you can just pick the encoding
to your liking (or the liking of the program that will eventually read
your file).</p>
<p>How do you find out which encoding a file you read has? Well,
unfortunately, there is no sureproof way to detect the encoding - the
same bytes can represent different, but equally valid characters in
different encodings. Therefore, you must rely on metadata (for example,
in HTTP headers) to know the encoding. Increasingly, formats just define
the encoding to be UTF-8.</p>
<p>Armed with this knowledge, let’s write a program that reads a file,
determines whether it’s JPG (hint: These files start with the bytes
<code class="docutils literal"><span class="pre">FF</span> <span class="pre">D8</span></code>), and writes a text file that describe the input file.</p>
<div class="code python highlight-python"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">io</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">'photo.jpg'</span><span class="p">,</span> <span class="s">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">inf</span><span class="p">:</span>
<span class="n">jpgdata</span> <span class="o">=</span> <span class="n">inf</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="k">if</span> <span class="n">jpgdata</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="n">b</span><span class="s">'</span><span class="se">\xff\xd8</span><span class="s">'</span><span class="p">):</span>
<span class="n">text</span> <span class="o">=</span> <span class="s">u'This is a jpeg file (</span><span class="si">%d</span><span class="s"> bytes long)</span><span class="se">\n</span><span class="s">'</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">text</span> <span class="o">=</span> <span class="s">u'This is a random file (</span><span class="si">%d</span><span class="s"> bytes long)</span><span class="se">\n</span><span class="s">'</span>
<span class="k">with</span> <span class="n">io</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s">'summary.txt'</span><span class="p">,</span> <span class="s">'w'</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">)</span> <span class="k">as</span> <span class="n">outf</span><span class="p">:</span>
<span class="n">outf</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">text</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">jpgdata</span><span class="p">))</span>
</pre></div>
</div>
<p>I am sure that now you would use <code class="docutils literal"><span class="pre">open</span></code> correctly!</p>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer" role="contentinfo">
© Copyright 2015, Muhammad Yasoob Ullah Khalid.
</div>
</body>
</html>