标签: multiprocessing

  • Python multiprocessing RawArray no disk space, or very slow

    It seems that Python writes to /tmp on linux base os when allocating python.multiprocessing.sharedctypes.RawArray. If the disk space on that path is not sufficient, “no disk space” error occurs.
    The solution is to change the default TMPDIR environment, using one of below methods:

    • bash: export TMPDIR='/the/new/path'
    • bash: TMPDIR=/the/new/path python3 your_script.py
    • python: os.environ['TMPDIR']='/your/new/path'

    By the way, using /dev/shm as the tmpdir enhances performance to me~

  • Python multiprocessing 并行化原则

    处理multiprocessing解决棘手的并行问题时,遵循以下策略:

    • 把工作拆分成独立单元;
    • 如果每项工作所花的时间是可变的,那就考虑随机化工作的序列;
    • 对工作队列进行排序,首先处理最慢的任务可能是一个最有用的策略(平均而言);
    • 对于细小琐碎的任务,考虑将他们合并分块(chunk),这样能有效减小fork/join通信开销;
    • 任务数量与物理CPU数量保持一致;

    部分摘自 <High Performance Python> (by Micha Gorelick, Ian Ozsvald)