python3

an intro

outline

definitions
application structure
changes + porting strategies

definitions

definitions: quick ones

py2: python2.x (realistically 2.7)
py3: python3.x (realistically >=3.3)
py2+py3: code crafted to run under both versions
six: a pretty-bare-bones python module for 2+3 compatibilty

definitions: text

human representation
"abc☃💩🐵"
string operations (slicing, replacing) are meaningful
py2: unicode, py3: str

definitions: bytes

computer representation of text
b'abc\xe2\x98\x83\xf0\x9f\x92\xa9\xf0\x9f\x90\xb5'
string operations not meaningful (may break a character)
py2: bytes (str), py3: bytes

definitons: native string

the default string type used by that python version
most stdlib apis written against this type
py2 + py3: str (convenient!)

application structure

outside world

interfaces

app

interfaces

network, filesystem, camera, etc.
all speak in bytes
encode to talk to them

application

collect data from interfaces
compute business logic
decode data from interfaces to use

how?

convert to bytes at boundaries
deal with text internally
pretty hard in py2! (we'll get to this)

porting strategies

to py2+py3 and beyond!

at a high level

syntax passes
linting passes
importable
tests pass

changes!

feature flags

new py3 features - enable them in py2 via flags
enabled via imports from the __future__ module
easiest steps to writing py2+py3 compatible code
turn on the flags on a per-module basis

from __future__ import unicode_literals

The default type for string literals in code becomes text. In python2, strings were by-default bytes. To explicitly make a bytes literal, use the b'' prefix.

from __future__ import absolute_import

Imports always start from sys.path roots.
Importing a module x becomes unambiguous
Adding a module can't break other modules' imports

from __future__ import print_function

print x
becomes: print(x)
print >>sys.stderr, x
becomes: print(x, file=sys.stderr)

from __future__ import divison

Not often as relevant. Division changes to floating point division by default in python3. Use x // y to explicitly do integer division.

moves

Many modules were non-pep8 named or poorly organized and were moved in py3. A few examples:

ConfigParser -> configparser
urlparse / urllib / urllib2 -> urllib.parse, urllib.request, urllib.response
SimpleHTTPServer -> http.server

moves (cont.)

six.moves provides easy access to the moved modules.

from six.moves.urllib_error import URLError
from six.moves import range

iterators

Many things which returned lists in py2 now return iterators. xrange is gone and range is now an iterator. dicts lose the .iter{items,keys,values}() functions.

iterators (cont.)

Often the iter{...} functions were faster in py2 than their list counterparts. Sometimes not!
If you're not terribly concerned about performance in py2, switch to use the py3 names (range, .items(), etc.).
If you're concerned about performance, six provides helpers like six.iteritems(dict_obj) to use iterators in 2+3

explicit string types

In python 2, adding a str object to a unicode object often just worked. In py2, implicit conversion between bytes and text was allowed via the US-ASCII encoding.

# py2
>>> 'foo' + u'☃'  # Implicitly 'foo'.decode('US-ASCII') + u'☃'
u'foo\u2603'
>>> '💩' + u'hi'  # Implicitly '💩'.decode('US-ASCII') + u'hi'
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 0: ordinal not in range(128)
>>> u'💩'.decode('UTF-8')  # implicitly u'💩'.encode('US-ASCII').decode('UTF-8')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f4a9' in position 0: ordinal not in range(128)

Each of these are a western bias!

explicit string types

In python3, the bytes and text types are explicitly separated. Mismatching of the two types is a TypeError

# py3
>>> b'' + ''
TypeError: can't concat bytes to str
>>> '☃'.decode('UTF-8')
...
AttributeError: 'str' object has no attribute 'decode'

explicit bytes type

The bytes type in py2 gives the illusion that it is a useful string type. Iterating it returns you 1-length bytes objects.
In py3, iterating a bytes object gives you integers (each byte)
six provides shims; ex: six.iterbytes(...)

text apis everywhere!

The stdlib (wherever possible) now requires text objects where it previously allowed either bytes or text. This makes it easier to write a correct application which deals with text internally.

cheat sheet for string types

| have | want | code | | ------------------ | ----- | ------------------- | | text | bytes | x.encode('UTF-8') | | bytes | text | x.decode('UTF-8') | | object (int, etc.) | text | six.text_type(x) |

files

In py2 open yielded bytes, in py3, open gives you text
Use io.open to get the python3 behaviour in python2

with io.open('f.txt', encoding='UTF-8') as f:
    # ...

subprocesses

subprocesses always return bytes. .decode() their output to get text

x = subprocess.check_output(('echo', 'hi')).decode('UTF-8')

urls

In python2, the url libraries dealt with bytes, in python3 they're text apis which use UTF-8 for url encoding

Use yelp_uri to get the python3 behaviour in python2.

http

http itself is a protocol of bytes. In both py2 and py3, the low-level Response objects will generally give you bytes objects (for instance when accessing .body).

To work with text objects, generally pick some higher-level abstraction such as the requests library.

c extensions

Relatively rare that you'll need to do this.

#if PY_MAJOR_VERSION >= 3
#define PySass_IF_PY3(three, two) (three)
#define PySass_Int_FromLong(v) PyLong_FromLong(v)
#define PySass_Bytes_AS_STRING(o) PyBytes_AS_STRING(o)
#define PySass_Object_Bytes(o) PyUnicode_AsUTF8String(PyObject_Str(o))
#else
#define PySass_IF_PY3(three, two) (two)
#define PySass_Int_FromLong(v) PyInt_FromLong(v)
#define PySass_Bytes_AS_STRING(o) PyString_AS_STRING(o)
#define PySass_Object_Bytes(o) PyObject_Str(o)
#endif

/* ... */

PyObject* py_result = PyObject_CallFunction(pyfunc, PySass_IF_PY3("y", "s"), path);
PyObject* signature = PySass_Object_Bytes(sass_function);

failures of py3

They couldn't get everything right!

surrogateescape - fake characters hidden in text strings to work with POSIX filesystem apis
PEP3333 - WSGI for py3. As specced, the wsgi environ is latin1 decoded text (western bias! mojibake unless careful!). .encode('latin1').decode('UTF-8') any time you need to access data

python3

an intro

outline

definitions

definitions: quick ones

definitions: text

definitions: bytes

definitons: native string

application structure

outside world

app

interfaces

application

how?

porting strategies

to py2+py3 and beyond!

at a high level

changes!

feature flags

moves

moves (cont.)

iterators

iterators (cont.)

explicit string types

explicit string types

explicit bytes type

text apis everywhere!

cheat sheet for string types

files

subprocesses

urls

http

c extensions

failures of py3

links