天道酬勤,学无止境

python-multiprocessing

multiprocessing on tee'd generators

Consider the following script in which I test two ways of performing some calculations on generators obtained by itertools.tee: #!/usr/bin/env python3 from sys import argv from itertools import tee from multiprocessing import Process def my_generator(): for i in range(5): print(i) yield i def double(x): return 2 * x def compute_double_sum(iterable): s = sum(map(double, iterable)) print(s) def square(x): return x * x def compute_square_sum(iterable): s = sum(map(square, iterable)) print(s) g1, g2 = tee(my_generator(), 2) try: processing_type = argv[1] except IndexError: processing_type = "no

2021-06-21 10:55:03    分类:问答    python   fork   itertools   python-multiprocessing

我可以将 map / imap / imap_unordered 与没有参数的函数一起使用吗?(Can I use map / imap / imap_unordered with functions with no arguments?)

问题 有时我需要对没有参数的函数使用多处理。 我希望我能做这样的事情: from multiprocessing import Pool def f(): # no argument return 1 # TypeError: f() takes no arguments (1 given) print Pool(2).map(f, range(10)) 我可以做Process(target=f, args=()) ,但我更喜欢map / imap / imap_unordered的语法。 有没有办法做到这一点? 回答1 map函数的第一个参数应该是一个函数,它应该接受一个参数。 这是强制性的,因为作为第二个参数传递的 iterable 将被迭代,并且值将在每次迭代中一一传递给函数。 因此,最好的办法是重新定义f以接受一个参数并忽略它,或者编写一个带有一个参数的包装函数,忽略该参数并返回f的返回值,如下所示 from multiprocessing import Pool def f(): # no argument return 1 def throw_away_function(_): return f() print(Pool(2).map(throw_away_function, range(10))) # [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

2021-06-21 10:00:50    分类:技术分享    python   python-multiprocessing

Python Multiprocessing: TypeError: __new__() missing 1 required positional argument: 'path'

I'm currently trying to run a parallel process in python 3.5 using the joblib library with the multiprocessing backend. However, every time it runs I get this error: Process ForkServerPoolWorker-5: Traceback (most recent call last): File "/opt/anaconda3/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/opt/anaconda3/lib/python3.5/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 108, in worker task = get() File "/opt/anaconda3/lib/python3.5/site-packages

2021-06-15 11:00:48    分类:问答    python   runtime-error   python-multithreading   python-multiprocessing   joblib

Multiprocessing slower than serial processing in Windows (but not in Linux)

I'm trying to parallelize a for loop to speed-up my code, since the loop processing operations are all independent. Following online tutorials, it seems the standard multiprocessing library in Python is a good start, and I've got this working for basic examples. However, for my actual use case, I find that parallel processing (using a dual core machine) is actually a little (<5%) slower, when run on Windows. Running the same code on Linux, however, results in a parallel processing speed-up of ~25%, compared to serial execution. From the docs, I believe this may relate to Window's lack of fork(

2021-06-15 03:36:07    分类:问答    python   multithreading   parallel-processing   multiprocessing   python-multiprocessing

How to retrieve values from a function run in parallel processes?

The Multiprocessing module is quite confusing for python beginners specially for those who have just migrated from MATLAB and are made lazy with its parallel computing toolbox. I have the following function which takes ~80 Secs to run and I want to shorten this time by using Multiprocessing module of Python. from time import time xmax = 100000000 start = time() for x in range(xmax): y = ((x+5)**2+x-40) if y <= 0xf+1: print('Condition met at: ', y, x) end = time() tt = end-start #total time print('Each iteration took: ', tt/xmax) print('Total time: ', tt) This outputs as expected: Condition met

2021-06-14 13:27:55    分类:问答    python   python-3.x   parallel-processing   multiprocessing   python-multiprocessing

Returning multiple lists from pool.map processes?

Win 7, x64, Python 2.7.12 In the following code I am setting off some pool processes to do a trivial multiplication via the multiprocessing.Pool.map() method. The output data is collected in List_1. NOTE: this is a stripped down simplification of my actual code. There are multiple lists involved in the real application, all huge. import multiprocessing import numpy as np def createLists(branches): firstList = branches[:] * node return firstList def init_process(lNodes): global node node = lNodes print 'Starting', multiprocessing.current_process().name if __name__ == '__main__': mgr =

2021-06-14 08:46:05    分类:问答    python   threadpool   python-multiprocessing

Changing the Buffer size in multiprocessing.Queue

So I have a system with a producer and a consumer are connected by a queue of unlimited size, but if the consumer repeatedly calls get until the Empty exception is thrown it does not clear the queue. I believe that this is because the thread in the queue on the consumer side which serialises the objects into the socket gets blocked once the socket buffer is full, and so it then waits until the buffer has space, however, it is possible for the consumer to call get "too fast" and so it thinks the queue is empty when in fact the thread on the other side has much more data to send but just cannot

2021-06-14 04:45:34    分类:问答    python   python-2.7   python-multiprocessing

Multiprocessing Pool with a for loop

I have a list of files that I pass into a for loop and do a whole bunch of functions. Whats the easiest way to parallelize this? Not sure I could find this exact thing anywhere and I think my current implementation is incorrect because I only saw one file being run. From some reading I've done, I think this should be a perfectly parallel case. Old code is something like this: import pandas as pd filenames = ['file1.csv', 'file2.csv', 'file3.csv', 'file4.csv'] for file in filenames: file1 = pd.read_csv(file) print('running ' + str(file)) a = function1(file1) b = function2(a) c = function3(b)

2021-06-14 03:54:24    分类:问答    python-3.x   python-multiprocessing

No space left while using Multiprocessing.Array in shared memory

I am using the multiprocessing functions of Python to run my code parallel on a machine with roughly 500GB of RAM. To share some arrays between the different workers I am creating a Array object: N = 150 ndata = 10000 sigma = 3 ddim = 3 shared_data_base = multiprocessing.Array(ctypes.c_double, ndata*N*N*ddim*sigma*sigma) shared_data = np.ctypeslib.as_array(shared_data_base.get_obj()) shared_data = shared_data.reshape(-1, N, N, ddim*sigma*sigma) This is working perfectly for sigma=1, but for sigma=3 one of the harddrives of the device is slowly filled, until there is no free space anymore and

2021-06-13 17:40:56    分类:问答    python   arrays   multiprocessing   python-multiprocessing

Python Multiprocessing a large dataframe on Linux

As shown in the title, I have a big data frame (df) that needs to be processed row-wise, as df is big (6 GB), I want to utilize the multiprocessing package of python to speed it up, below is a toy example, given my writing skill and complexity of the task, I'll describe what I want to achieve briefly and levea the details for the code. The original data is df, from which I want to perform some row-wise analysis(order does not matter) that requires not just the focal row itself but other rows that satisfy certain conditions. Below are the toy data and my code, import pandas as pd import numpy

2021-06-13 02:44:57    分类:问答    python   pandas   python-multiprocessing