Python linux shared memory

Possible to share in-memory data between 2 separate processes?

I have an xmlrpc server using Twisted. The server has a huge amount of data stored in-memory. Is it possible to have a secondary, separate xmlrpc server running which can access the object in-memory in the first server? So, serverA starts up and creates an object. serverB starts up and can read from the object in serverA. * EDIT * The data to be shared is a list of 1 million tuples.

10 Answers 10

Without some deep and dark rewriting of the Python core runtime (to allow forcing of an allocator that uses a given segment of shared memory and ensures compatible addresses between disparate processes) there is no way to «share objects in memory» in any general sense. That list will hold a million addresses of tuples, each tuple made up of addresses of all of its items, and each of these addresses will have be assigned by pymalloc in a way that inevitably varies among processes and spreads all over the heap.

On just about every system except Windows, it’s possible to spawn a subprocess that has essentially read-only access to objects in the parent process’s space. as long as the parent process doesn’t alter those objects, either. That’s obtained with a call to os.fork() , that in practice «snapshots» all of the memory space of the current process and starts another simultaneous process on the copy/snapshot. On all modern operating systems, this is actually very fast thanks to a «copy on write» approach: the pages of virtual memory that are not altered by either process after the fork are not really copied (access to the same pages is instead shared); as soon as either process modifies any bit in a previously shared page, poof, that page is copied, and the page table modified, so the modifying process now has its own copy while the other process still sees the original one.

This extremely limited form of sharing can still be a lifesaver in some cases (although it’s extremely limited: remember for example that adding a reference to a shared object counts as «altering» that object, due to reference counts, and so will force a page copy!). except on Windows, of course, where it’s not available. With this single exception (which I don’t think will cover your use case), sharing of object graphs that include references/pointers to other objects is basically unfeasible — and just about any objects set of interest in modern languages (including Python) falls under this classification.

Читайте также:  Root login enable in linux

In extreme (but sufficiently simple) cases one can obtain sharing by renouncing the native memory representation of such object graphs. For example, a list of a million tuples each with sixteen floats could actually be represented as a single block of 128 MB of shared memory — all the 16M floats in double-precision IEEE representation laid end to end — with a little shim on top to «make it look like» you’re addressing things in the normal way (and, of course, the not-so-little-after-all shim would also have to take care of the extremely hairy inter-process synchronization problems that are certain to arise;-). It only gets hairier and more complicated from there.

Modern approaches to concurrency are more and more disdaining shared-anything approaches in favor of shared-nothing ones, where tasks communicate by message passing (even in multi-core systems using threading and shared address spaces, the synchronization issues and the performance hits the HW incurs in terms of caching, pipeline stalls, etc, when large areas of memory are actively modified by multiple cores at once, are pushing people away).

For example, the multiprocessing module in Python’s standard library relies mostly on pickling and sending objects back and forth, not on sharing memory (surely not in a R/W way!-).

I realize this is not welcome news to the OP, but if he does need to put multiple processors to work, he’d better think in terms of having anything they must share reside in places where they can be accessed and modified by message passing — a database, a memcache cluster, a dedicated process that does nothing but keep those data in memory and send and receive them on request, and other such message-passing-centric architectures.

Читайте также:  Сменить пароль юзера linux

Источник

Разделяемая память для процессов на Python и не только.

Есть большое многопроцессное приложение, с рабочими процессами, всякими GUI и логгерами в отдельных процессах и т.д. Построено с использованием модуля multiprocessing. Рабочие процессы обрабатывают большие данные. Для больших данных используется multiprocessing.Array, вот так:

class Worker(mp.Process): def __init__(self, buffers, pipe, other): super().__init__() self._buffers = buffers self._pipe = pipe def run(self): param = get_message(self._pipe) big_calculations(self._buffers[param.a], param.b, . ) buffers = [] for i in range(10): buffers.append(mp.Array(ctypes.c_uint8, buffer_size, lock=False)) . for i in range(10): p1, p2 = mp.Pipe() worker = Worker(buffers, p1, other_param) worker.start() 

Синхронизация от mp.Array не требуется, процессы синхронизируются с помощью посылки/отправки сообщений через p1/p2, поэтому lock=False.

Вопросы:
* Размер/количество буфера(ов) задаются до запуска рабочих процессов. Как правильно реализовать изменение количества/размера после того, как рабочие процессы уже стартовали? Т.е. понятно, как отправить сообщение. Непонятно как закрыть существующий буфер и открыть новый.
* Что у mp.Array под капотом? Я заметил, что python открывает много файлов с именами вида /tmp/pymp-ixc54qx7/pym-27111-h7wi_sy3. Это как-то связано с mp.Array?
* Очень желательно, чтобы к этой общей памяти можно было обращаться не только из процессов на Python. Возможно как-либо её открыть из другого стороннего процесса, написанного на чём-то ещё? Может быть мне тогда что-то другое использовать, а не mp.Array?

Аналогичные вопросы про mp.Pipe():
* Как в работающий процесс передать конец новой трубы?
* Как передать в не Python’овский процесс конец трубы?

Источник

Shared memory in multiprocessing

I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers.

l1=[bitarray 1, bitarray 2, . ,bitarray n] l2=[array 1, array 2, . , array n] l3=[array 1, array 2, . , array n] 
multiprocessing.Process(target=someFunction, args=(l1,l2,l3)) 

Does this mean that l1, l2 and l3 will be copied for each sub-process or will the sub-processes share these lists? Or to be more direct, will I use 16GB or 192GB of RAM? someFunction will read some values from these lists and then performs some calculations based on the values read. The results will be returned to the parent-process. The lists l1, l2 and l3 will not be modified by someFunction. Therefore i would assume that the sub-processes do not need and would not copy these huge lists but would instead just share them with the parent. Meaning that the program would take 16GB of RAM (regardless of how many sub-processes i start) due to the copy-on-write approach under linux? Am i correct or am i missing something that would cause the lists to be copied? EDIT: I am still confused, after reading a bit more on the subject. On the one hand Linux uses copy-on-write, which should mean that no data is copied. On the other hand, accessing the object will change its ref-count (i am still unsure why and what does that mean). Even so, will the entire object be copied? For example if i define someFunction as follows:

def someFunction(list1, list2, list3): i=random.randint(0,99999) print list1[i], list2[i], list3[i] 

Would using this function mean that l1, l2 and l3 will be copied entirely for each sub-process? Is there a way to check for this? EDIT2 After reading a bit more and monitoring total memory usage of the system while sub-processes are running, it seems that entire objects are indeed copied for each sub-process. And it seems to be because reference counting. The reference counting for l1, l2 and l3 is actually unneeded in my program. This is because l1, l2 and l3 will be kept in memory (unchanged) until the parent-process exits. There is no need to free the memory used by these lists until then. In fact i know for sure that the reference count will remain above 0 (for these lists and every object in these lists) until the program exits. So now the question becomes, how can i make sure that the objects will not be copied to each sub-process? Can i perhaps disable reference counting for these lists and each object in these lists? EDIT3 Just an additional note. Sub-processes do not need to modify l1 , l2 and l3 or any objects in these lists. The sub-processes only need to be able to reference some of these objects without causing the memory to be copied for each sub-process.

Читайте также:  Все драйверы для линукс

Источник

Оцените статью
Adblock
detector