tl;dr: If you want to write cool pandas code then you need to chain your methods. It is not only more legible, but in many cases more efficient, too!

The basics

At the beginning you are inclined to write things like this (at least I did it all the time, and even used inplace=True, but more on that later):

import pandas as pd

# Read dataframe from a csv file
df = pd.read_csv('lots-of-data.csv', usecols=['A', 'B', 'C'])

# Add new columns
df['D'] = df['A'] + df['B']
df['E'] = df['A'] * df['C']

# Drop a column
new_df = df.drop('A', axis=1)

# Rename a column
even_newer_df = df.rename(columns={'B': 'F'})

# and so on...

This can get tiresome when doing a lot of operations on the same dataframe, even if we reuse our df variable:

df = df.drop('A', axis=1)
df = df.rename(columns={'B': 'F'})

But hey, what if we just, you know, chain these two together? Boom, this is method chaining:

df = df.drop('A', axis=1).rename(columns={'B': 'F'})

Creating columns in a chain

But we also have to be able to create new columns in a method chain, to avoid df['new_col'] = .... This is where .assign() comes into play:

df = df.assign(
	D=lambda x: x['A'] + x['B'],
	E=lambda x: x['A'] * x['C']
)

So .assign() takes keyword arguments to turn them into columns on df, in this case using the results of the lambda functions (alternatively you can use an “unpacked” dict, like .assign(**{'col1': ..., 'col2': ...}) where you can name col1 and col2 more freely as in a keyword argument).

What’s more, from Python 3.7 you can even use newly assigned columns in subsequent keyword arguments in the same .assign() call (pre-Python 3.6 you need to chain multiple .assign() calls if you want to use the result of one in the next.)

So, we can write our original example as the following (notice the extra parenthesis around the whole chain, which lets me break method calls into their own lines):

df = (
	pd.read_csv(
		'lots-of-data.csv', 
		usecols=['A', 'B', 'C']
	)
	.assign(
		D=lambda x: x['A'] + x['B'],
		E=lambda x: x['A'] * x['C']
	)
	.drop('A', axis=1)
	.rename(columns={'B': 'F'})
)

Piping the complete dataframe

The other esential element in method chaining is .pipe(), which takes a function as its first parameter, and passes the complete dataframe to it (unlike .agg(), .transform() and .apply(), which all operate along an axis, ie. work on rows or columns of a DataFrame).

This opens up a whole range of possibilities. For instance, we can reindex a DataFrame based on the current index, where a simple .reindex() call won’t do it, because we need to have access to the index of the original dataframe, like:

df.pipe(lambda x: x.reindex(np.arange(x.max())))

This will simply make sure that there are no gaps in the index of df, but we can pass any function into pipe that takes the whole DataFrame as a parameter.

Method chains are efficient

Avoiding these lines with method chaining are not only nice, but more efficient.

df['D'] = df['A'] + df['B']
df['E'] = df['A'] * df['C']

Due to the inner workings of pandas both of these will result in copying the whole dataframe into a new memory location (for detailed explanation check out this talk from Marc Garcia, member of the pandas core team), whereas packing it in one .assign() call helps pandas achieve this in one step. This matters a lot with larger datasets and multiple new columns.

Also, method chaining forces you to live without the evil inplace=True keyword argument in pandas methods. While doing transformations “inplace” is tempting, especially when you are new to pandas (I did it all the time, it felt somehow safer), it should be avoided.

The problem is that in some cases inplace=True will still result in moving the whole data to a new memory location, even if you are not aware of it. So you are better off always using the return values from the methods.

The flipside: don’t overuse

Just as any concepts, method chaining can be abused and overused, too. Chaining together dozens of method calls (especially .apply() and .pipe() with external functions), doing too many transformations in one huge chain, packing lambda functions into lambda functions can all make your code hard to read and even harder to debug.

But trust me, if you use it right, method chaining will definitely improve your work.