"abc☃💩🐵"
unicode
, py3: str
b'abc\xe2\x98\x83\xf0\x9f\x92\xa9\xf0\x9f\x90\xb5'
bytes (str)
, py3: bytes
str
(convenient!)byte
s__future__
modulefrom __future__ import unicode_literals
The default type for string literals in code becomes text.
In python2, strings were by-default bytes
.
To explicitly make a bytes
literal, use the b''
prefix.
from __future__ import absolute_import
sys.path
roots.x
becomes unambiguousfrom __future__ import print_function
print x
print(x)
print >>sys.stderr, x
print(x, file=sys.stderr)
from __future__ import divison
Not often as relevant. Division changes to floating point
division by default in python3. Use x // y
to explicitly do integer division.
Many modules were non-pep8 named or poorly organized and were moved in py3. A few examples:
ConfigParser
-> configparser
urlparse / urllib / urllib2
-> urllib.parse, urllib.request, urllib.response
SimpleHTTPServer
-> http.server
six.moves
provides easy access to the moved modules.
from six.moves.urllib_error import URLError
from six.moves import range
Many things which returned lists in py2 now return iterators. xrange
is gone and range
is now an iterator.
dict
s lose the .iter{items,keys,values}()
functions.
range
, .items()
, etc.).six
provides helpers like six.iteritems(dict_obj)
to use iterators in 2+3In python 2, adding a str
object to a unicode
object often just worked.
In py2, implicit conversion between bytes
and text
was allowed via the US-ASCII
encoding.
# py2
>>> 'foo' + u'☃' # Implicitly 'foo'.decode('US-ASCII') + u'☃'
u'foo\u2603'
>>> '💩' + u'hi' # Implicitly '💩'.decode('US-ASCII') + u'hi'
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
>>> u'💩'.decode('UTF-8') # implicitly u'💩'.encode('US-ASCII').decode('UTF-8')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f4a9' in position 0: ordinal not in range(128)
Each of these are a western bias!
In python3, the bytes
and text
types are explicitly separated.
Mismatching of the two types is a TypeError
# py3
>>> b'' + ''
TypeError: can't concat bytes to str
>>> '☃'.decode('UTF-8')
...
AttributeError: 'str' object has no attribute 'decode'
bytes
type in py2 gives the illusion that it is a useful string type. Iterating it returns you 1-length bytes objects.bytes
object gives you integers (each byte)six
provides shims; ex: six.iterbytes(...)
The stdlib (wherever possible) now requires text objects where it previously allowed either bytes
or text
.
This makes it easier to write a correct application which deals with text internally.
| have | want | code |
| ------------------ | ----- | ------------------- |
| text | bytes | x.encode('UTF-8')
|
| bytes | text | x.decode('UTF-8')
|
| object (int, etc.) | text | six.text_type(x)
|
open
yielded bytes
, in py3, open
gives you text
io.open
to get the python3 behaviour in python2with io.open('f.txt', encoding='UTF-8') as f:
# ...
subprocesses always return bytes
. .decode()
their output to get text
x = subprocess.check_output(('echo', 'hi')).decode('UTF-8')
In python2, the url libraries dealt with bytes, in python3 they're text apis which use UTF-8 for url encoding
Use yelp_uri
to get the python3 behaviour in python2.
http itself is a protocol of bytes. In both py2 and py3, the low-level Response
objects will generally give you bytes
objects (for instance when accessing .body
).
To work with text objects, generally pick some higher-level abstraction such as the requests
library.
Relatively rare that you'll need to do this.
#if PY_MAJOR_VERSION >= 3
#define PySass_IF_PY3(three, two) (three)
#define PySass_Int_FromLong(v) PyLong_FromLong(v)
#define PySass_Bytes_AS_STRING(o) PyBytes_AS_STRING(o)
#define PySass_Object_Bytes(o) PyUnicode_AsUTF8String(PyObject_Str(o))
#else
#define PySass_IF_PY3(three, two) (two)
#define PySass_Int_FromLong(v) PyInt_FromLong(v)
#define PySass_Bytes_AS_STRING(o) PyString_AS_STRING(o)
#define PySass_Object_Bytes(o) PyObject_Str(o)
#endif
/* ... */
PyObject* py_result = PyObject_CallFunction(pyfunc, PySass_IF_PY3("y", "s"), path);
PyObject* signature = PySass_Object_Bytes(sass_function);
They couldn't get everything right!
.encode('latin1').decode('UTF-8')
any time you need to access data