Speed up Sampling Based Calculation

December 18, 2020

Auther: Percy

Updated at 2020/12/19

Body

To my best knowledge, the most of the sampling based algorithm can be simply run in multi-process/parallel as we can divide the samples into parts.

This simple blog aims at reminding myself that we can benefit a lot with naive multiprocessing.

The next part is a crude example on the calculation of Shpaley value based on Monte Carlo in Python.

  • single processing
from tqdm import trange
import numpy as np


def eval_shap(x_train, y_train, x_test, y_test):    
    for t in trange(M):
		    ''' calculate Shapley value with Monte Carlo
		    ...
		    '''
    return np.asarray(sv) # type(sv) -> list
  • multiprocessing
from tqdm import trange
from multiprocessing import Pool
from functools import partial
import numpy as np

procs_num = 5


def sub_task(x_train, y_train, x_test, y_test, ID):
    for t in trange(M / procs_num):
		    ''' calculate Shapley value with Monte Carlo
		    ...
		    '''
    # print('task of proc' + str(ID) + ' completed!')
    return sv # type(sv) -> list


def eval_shap_multi_proc(x_train, y_train, x_test, y_test):
    proc_ids = [i for i in range(procs_num)]

    pool = Pool()
    func = partial(sub_task, x_train, y_train, x_test, y_test)
    ret = pool.map(func, proc_ids)
    pool.close()
    pool.join()
    ret_arr = np.asarray(ret)
    return np.sum(ret_arr, axis=0)

Time Cost Comparison

Test on MacBook Pro (16-inch, 2019)

Intel Core i7 - 6 cores 2.6GHz

16 GB 2667MHz DDR4

# multi
100%|██████████| 6000/6000 [01:23<00:00, 71.56it/s]
100%|██████████| 6000/6000 [01:24<00:00, 70.99it/s]
100%|██████████| 6000/6000 [01:26<00:00, 69.50it/s]
100%|██████████| 6000/6000 [01:26<00:00, 69.49it/s]
100%|██████████| 6000/6000 [01:26<00:00, 69.28it/s]

# single
100%|██████████| 30000/30000 [06:31<00:00, 76.65it/s]

With multiprocessing, the time cost becomes a quarter of the original.

And the result is certainly similar. (They use different permutations here!)

# multi
[1172.625       843.54166667 1308.43333333 1159.20833333 1560.875
 1327.68333333  991.15        762.61666667 1379.56666667  520.84166667
  994.625       906.79166667 1050.1         805.875       987.28333333
 1025.06666667  923.14166667 1115.40833333  995.43333333  760.525
  916.14166667  687.3         892.63333333  912.675      1051.75833333
 1041.11666667 1010.83333333  718.66666667  799.10833333  960.50833333]
 
# single
[1155.29166667  863.43333333 1308.04166667 1189.74166667 1561.99166667
 1379.025       970.75        798.75       1313.13333333  522.35833333
  985.5         868.38333333 1056.575       757.6        1018.25833333
 1027.33333333  936.275      1033.80833333  991.68333333  789.7
  886.95        729.475       889.26666667  908.63333333 1066.10833333
 1056.36666667 1049.075       700.33333333  822.53333333  945.99166667]

Chinese Version

正文

在我的认知中,大多数基于采样的算法都可以被很简单地多进程处理/并行化,因为采到的样本可以被分成多个部分。

这篇简单的博文用于提醒我自己,很简单的多进程/并行也可以有不错的收益。

接下来是一个简陋的Python例子,使用蒙特卡洛方法计算Shapley值。

  • 单线程

见上文。

  • 多线程

见上文。

耗时对比

见上文。