"abc☃💩🐵"unicode, py3: strb'abc\xe2\x98\x83\xf0\x9f\x92\xa9\xf0\x9f\x90\xb5'bytes (str), py3: bytesstr (convenient!)bytes__future__ modulefrom __future__ import unicode_literals
The default type for string literals in code becomes text.
In python2, strings were by-default bytes.
To explicitly make a bytes literal, use the b'' prefix.
from __future__ import absolute_import
sys.path roots.x becomes unambiguousfrom __future__ import print_function
print xprint(x)print >>sys.stderr, xprint(x, file=sys.stderr)from __future__ import divison
Not often as relevant. Division changes to floating point
division by default in python3. Use x // y to explicitly do integer division.
Many modules were non-pep8 named or poorly organized and were moved in py3. A few examples:
ConfigParser -> configparserurlparse / urllib / urllib2 -> urllib.parse, urllib.request, urllib.responseSimpleHTTPServer -> http.serversix.moves provides easy access to the moved modules.
from six.moves.urllib_error import URLError
from six.moves import range
Many things which returned lists in py2 now return iterators. xrange is gone and range is now an iterator.
dicts lose the .iter{items,keys,values}() functions.
range, .items(), etc.).six provides helpers like six.iteritems(dict_obj) to use iterators in 2+3In python 2, adding a str object to a unicode object often just worked.
In py2, implicit conversion between bytes and text was allowed via the US-ASCII encoding.
# py2
>>> 'foo' + u'☃' # Implicitly 'foo'.decode('US-ASCII') + u'☃'
u'foo\u2603'
>>> '💩' + u'hi' # Implicitly '💩'.decode('US-ASCII') + u'hi'
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
>>> u'💩'.decode('UTF-8') # implicitly u'💩'.encode('US-ASCII').decode('UTF-8')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f4a9' in position 0: ordinal not in range(128)
Each of these are a western bias!
In python3, the bytes and text types are explicitly separated.
Mismatching of the two types is a TypeError
# py3
>>> b'' + ''
TypeError: can't concat bytes to str
>>> '☃'.decode('UTF-8')
...
AttributeError: 'str' object has no attribute 'decode'
bytes type in py2 gives the illusion that it is a useful string type. Iterating it returns you 1-length bytes objects.bytes object gives you integers (each byte)six provides shims; ex: six.iterbytes(...)The stdlib (wherever possible) now requires text objects where it previously allowed either bytes or text.
This makes it easier to write a correct application which deals with text internally.
| have | want | code |
| ------------------ | ----- | ------------------- |
| text | bytes | x.encode('UTF-8') |
| bytes | text | x.decode('UTF-8') |
| object (int, etc.) | text | six.text_type(x) |
open yielded bytes, in py3, open gives you textio.open to get the python3 behaviour in python2with io.open('f.txt', encoding='UTF-8') as f:
# ...
subprocesses always return bytes. .decode() their output to get text
x = subprocess.check_output(('echo', 'hi')).decode('UTF-8')
In python2, the url libraries dealt with bytes, in python3 they're text apis which use UTF-8 for url encoding
Use yelp_uri to get the python3 behaviour in python2.
http itself is a protocol of bytes. In both py2 and py3, the low-level Response objects will generally give you bytes objects (for instance when accessing .body).
To work with text objects, generally pick some higher-level abstraction such as the requests library.
Relatively rare that you'll need to do this.
#if PY_MAJOR_VERSION >= 3
#define PySass_IF_PY3(three, two) (three)
#define PySass_Int_FromLong(v) PyLong_FromLong(v)
#define PySass_Bytes_AS_STRING(o) PyBytes_AS_STRING(o)
#define PySass_Object_Bytes(o) PyUnicode_AsUTF8String(PyObject_Str(o))
#else
#define PySass_IF_PY3(three, two) (two)
#define PySass_Int_FromLong(v) PyInt_FromLong(v)
#define PySass_Bytes_AS_STRING(o) PyString_AS_STRING(o)
#define PySass_Object_Bytes(o) PyObject_Str(o)
#endif
/* ... */
PyObject* py_result = PyObject_CallFunction(pyfunc, PySass_IF_PY3("y", "s"), path);
PyObject* signature = PySass_Object_Bytes(sass_function);
They couldn't get everything right!
.encode('latin1').decode('UTF-8') any time you need to access data