Aujourd’hui je démystifie IO et les nombreux pièges en python. – Diapositives : – Diapositives …
Hello this is Anthony back for another episode in the Python 3 series I know it’s been a while and I’m gonna be honest I don’t really have an excuse anyway here’s an episode which attempts to demystify IO in Python today we’re going to be covering the differences between binary and textio
And we’ll cover the behavior in both Python 2 and Python 3 and finally I’ll give some code snippets showing you how to deal with these differences in a 2 & 3 compatible way first we should define what I mean by ayah files on disk store data in a
Series of ones and zeros but for simplicity’s sake we’ll just consider them to be a series of bytes this will make it slightly easier to reason about will also consider standard in and standard out the input and output of a program to both be a special type
Of file for text which you can think about as a series of code points the bytes that represent these code points have a specific encoding in other words a code point translated with a specific encoding will map to a specific sequence of bytes in this example here I’m using
The utf-8 encoding utf-8 is an ASCII compatible encoding that is it translates code points that are in the ASCII range to the same bytes as the ASCII encoding does it is also a variable length encoding translating some code points to different length byte sequences for example our friend
The Snowman is translated to 3 bytes in utf-8 at some point I’ll probably do a whole video on encodings but for now it’s sufficient to know that textual data and bytes data are not the same first let’s talk about binary i/o a binary i/o takes in binary data
Surprised and writes that unmodified to the disk two types and python represent binary data bytes and byte array the bytes in your program are written directly to disk without any translation next we’ll describe textual IO text which is the unicode type in Python 2 and stir in Python 3 is first translated
With an encoding to a series of bytes and then that series of bytes is written to disk the text in your program is represented as encoded bytes on disk next we’ll talk about the behavior in Python 2 the first topic is the built-in open function files opened with open
Will always be in binary mode in Python 2 using W B or RB as the mode is redundant though it’s a good idea to document your intent as with most Python 2 api’s Unicode may be implicitly converted by the ASCII code when writing the first example here demonstrates a
Successful implicit conversion of ASCII text the second shows that the data is binary and a third shows how this can go pear-shaped when writing non ASCII textual data it is possible to convert these binary streams into pseudo text io objects using the C API now we’ll look
At some special streams in Python which do just that a standard out and standard error members of the SIS model are just special file objects during interpreter start up PI files set encoding and errors is called on these two objects but in Python 2 this only happens when
Those streams are attached to a terminal as such you can write textual data to standard out in standard air and with the print function as long as that textual data is encoded with the automatically detected encoding I say automatically detected encoding because it is detected via environment variables
This may seem nuts but Python inherited this idea from C if you don’t have this environment variable set or this environment variable is set to C it will be detected as the ASCII encoding which cannot encode very many code points I mentioned before that Python 2 only
Changes these two text streams if a terminal is attached as demonstrated here if you’re writing to a pipe or a file the automatic encoding detection fails in summary in Python 2 if the environment may vary or you might not have a TTY the only reliable way to write non ASCII
Textual data to standard out or standard error is by writing encoded bytes directly next we’ll talk about some in-memory ioad classes the first is see stirring IO dot string IO this is a binary stream and acts similar to the open function in Python 2 the second is string IO dot
Storing IO this allows mixed-mode text plus binary writing if the ascii encoding can convert between the two if any of the inputs are text the result from reading a string IO object will also be text otherwise it will produce binary output storing io is much much slower than the C string yo-yo
Counterpart as it is implemented in pure Python here’s an example where mixing datatypes will cause a Unicode decoder next we’ll switch to talking about the new behavior in Python 3 we’ll just be discussing the i/o module throughout the one thing to know is that in Python 3 the built-in open function is identical
To IO dot open iodine open has two modes dependent on the mode parameter Posten if the mode is WBE or RB b being binary the open function will return a binary i/o object this object is strict and requires bytes going with one of the broad themes of Python 3 implicit
Conversion between text and bytes is explicitly an error otherwise IO dot open will return an IO textio wrapper this object requires text when writing here’s an example demonstrating the error message received when attempting to write text to a binary string equivalently here’s the same situation in Reverse writing binary data to a text
Strip if you’d like to control the encoding that is used to write text data you can pass the encoding keyword argument if encoding is not passed it will be automatically determined using locale get preferred encoding which usually means looking at the lang environment variable conveniently when discussing standard
Out standard error in print in Python 3 we don’t need to extend the discussion beyond saying that they are textio implemented as textile wrappers you can access the underlying binary object by accessing the buffer attribute printin Python 3 will write as if writing to a text stream inverting any
Of the arguments with stir if necessary one interesting side effect of this is when printing a bytes object it will literally print the B and the quotation marks that’s with Python to writing text is subject to environment variables again for the same reasons one huge gotcha in Python three is that
IO is now buffered by default this means that when calling print or write your data may not be instantly written and may be deferred until later if your output may be piped such as writing to a log file be sure to either manually call flush or use the flush equals true
Keyword argument such that it is shown immediately or in memory IO the story is much simpler in Python 3 there is an IO dot bites IO class for binary in memory operations and an IO dot string ie counterpart for text now we’ll finally get to talk about how to write 2 plus 3
Code fortunately this discussion is super easy as the IO module is included in Python 2 point 6 plus simply replace open calls with I open and replace C string io / string IO with either IO bytes IO or IO dot string I am writing binary data to standard IO can be
Accomplished by conditionally accessing the buffer property and using that to write writing textio is a little more difficult as the IO textio rapper backport doesn’t work with the standard i/o streams fortunately you can just use codex getwriter instead if your program needs to write in a constrained environment where Lang might not be
Reliable you can either write as bytes as seen before or you can use a variation of the text writing where you force the encoding to a specific encoding in this case I’m forcing the utf-8 encoding that’s all for i/o thanks for watching and have a good one
€
€