Intermediate result storage

Sometimes, specially with big payloads, alternative transfer methods can be considered, since transferring huge data streams using messages can be quite expensive.

There are multiple approaches to solve this issue, mostly revolving around transfering data via external means, some of them will be explained here.

Shared memory

Starting from Python 3.8, shared memory enables multiple processes (running in the same system) to read and write to the same memory buffer.

Shared memory is exposed as a buffer we can later read and write to, and as such we can use it as a transport for our data.

This feature plays quite nicely with uActor actors.

Example:

import os
import multiprocessing.managers
import multiprocessing.shared_memory

import uactor

class SharedActor(uactor.Actor):
    def __init__(self):
        self.shared = multiprocessing.managers.SharedMemoryManager()
        self.shared.start()

    def shutdown(self):
        self.shared.shutdown()

    def get_shared_res(self):
        data = f'Shared memory from {os.getpid()}.'.encode()
        shared = self.shared.SharedMemory(len(data))
        shared.buf[:] = data
        return shared

with SharedActor() as actor:
    shared = actor.get_shared_res()
    print(f'Running on {os.getpid()}')
    # Running on 11209
    print(shared.buf.tobytes().decode())
    # Shared memory from 47989.
    shared.unlink()

Any kind of structured data, not just bytes, can be transferred this way as long it is serialized like as using pickle.

External data stores

Using a centralized broker where all processes can concurrently store and retrieve data can be considered, specially when distributing actors over the network.

Some in-memory databases (such as memcached or redis) are specially good at storing temporary data, but even traditional databases or dedicated blob storage services (such as min.io) could be used, enabling those resources to be accessed on actors accross the network.

When talking about local-only actors, tempfile.NamedTemporaryFile could be also a good option when used alongside RAM-based virtual filesystems like tmpfs. Regular filesystems should be only considered for objects being forwarded across many actors since even the fastest block device is still slower than regular multiprocessing mechanisms.

Integrating with external services is a whole big subject on its own and its outside this documentation scope.