eval方法可以直接利用c语言的速度,而不用分配中间数组,不需要中间内存的占用.
如果包含多个步骤,每个步骤都要分配一块内存
import numpy as np
import pandas as pd
import timeit
df = pd.DataFrame({'a': np.random.randn(10000000),
'b': np.random.randn(10000000),
'c': np.random.randn(10000000),
'x': 'x'})
# print df
start_time = timeit.default_timer()
df['a']/( df['b']+0.1)-df['c']
end_time = timeit.default_timer()
print (end_time - start_time)
print "___________________"
start_time = timeit.default_timer()
pd.eval("df['a']/( df['b']+0.1)-df['c']")
end_time = timeit.default_timer()
print (end_time - start_time)
运行时间对比
0.136633455546
___________________
0.087637596342
As of version 0.13 (released January 2014), Pandas includes some experimental tools that allow you to directly access C-speed operations without costly allocation of intermediate arrays.