Concurrency
Last updated
Last updated
The concurrency approach of Hylo is still under design. This document only presents our current plans. Anything is still subject to change.
Concurrency in Hylo is based on the following principles:
Concurrent code has the same syntax/semantics as non-concurrent code.
No function colouring (e.g., marking functions with async
).
No different abstractions for expressing concurrency (e.g., actor
abstractions)
Concurrency needs to be structured.
Following the main ideas from , we mean the following by structured concurrency:
Use of abstractions as building blocks.
Ability to use recursive decomposition of the program.
Ability to use local reasoning.
Soundness and completeness: all concurrent programs can be safely modeled in the concurrency model.
See this article for more details on structured concurrency.
Functions encapsulate concurrency.
Concurrency can be expressed with the help of simple functions.
One doesn't need to understand the implementation of a function to understand how concurrency affects the outer code.
Functions have one entry point and one exit point, regardless of the concurrency expressed in it.
The user shall focus on expressing the constraints between work items.
There is no need for low-level synchronization primitives, like mutexes and semaphores.
User can understand and control the costs associated with concurrency.
Understanding how the application uses spawn
abstractions and how it interacts with the non-Hylo code is enough to understand the costs associated with concurrency.
Hylo allows the user to fully customize the handling of concurrency.
This code will print Hello, concurrent world!
and will do this (most likely) from a different OS thread .
Let's have a slightly more complicated example:
The above code will print:
The spawn
on line 6 will create a new thread of execution and call do_greet()
on that thread. On multi-core machines, this may be executed in parallel with the call to prime_number()
. The call to await
will join the two threads; it will return the value produced by the spawned thread in the context of the original thread.
Assuming that there are two OS threads for this program, the call to await
will not block any of these threads. Instead, it will continue on the OS thread that finishes last. In our example, this is most probably the spawn thread. This leads to an interesting arrangement: the OS thread on which concurrent_greeting()
finishes is different than the OS thread on which this function is started. We call this thread inversion.
The OS thread that starts concurrent_greeting()
will be returned to the middleware when await
is called and can be reused to run other work items, thus avoiding the main performance problems associated with the classic thread & locks model.
The following code shows a basic implementation of a quick-sort algorithm that can take advantage of multicore systems:
We break the initial array into two parts, based on a pivot element. All the elements in the left side are less or equal to the pivot element, and all the elements on the right side are greater than the pivot element. This operation is happening on a single thread. Then, we sort the two slices of the array recursively; in this case, sorting the left side happens concurrently with the sorting of the right side.
Please note that the concurrency middleware won't just create OS threads for each spawn
invocation. For an efficient utilisation of the resources, the number of OS threads is typically equal to the number of cores that the target hardware has. The user just specifies the concurrent behaviour of the application (which work items can work concurrently with others) and the concurrency middleware ensures the proper arrangement of work on the hardware.
The code allows the user to clearly express concurrency constraints:
partition of the elements need to happen serially and before the two sides will be sorted
the two sides can be sorted in parallel.
Although the algorithm runs a significant chunk of work serially, the performance boost into a multicore system comes from the division of the work in two chunks that can be executed concurrently. As we will quickly break those work chunks into halves again, we would soon create work for all the cores in the system.
The user will get performance improvements of the concurrent algorithm when the number of elements in each tile is large, and when we have sufficient tiles to cover the hardware threads.
In the previous examples, we could only fork a thread of execution into two threads of executions. While this is enough from a theoretical perspective, in practice we want to to create multiple threads of executions from a single invocation. Here is a simple example on how can this be achieved:
This will call the greet()
function 19 times with numbers from 0 to 18. The await()
call ensures that all spawned computations complete before ending the function main()
.
By default, the concurrency middleware comes with a scheduler that dispatches incoming work to a pool of OS threads. The number of OS threads created depends on the actual number of hardware threads available on the target platform. This type of scheduler is optimised for executing CPU-intensive work.
However, the user can customise the scheduler used for spawning work, and provide version that are more optimised for things like I/O, networking, GPU, executing on different machines, etc. Here is an example:
In this example, the actual reading from file happens on a different scheduler and will not block any of the threads associated with the default CPU scheduler. The thread that executes the code will be suspended at line 2 until the data is read from the file. The code will continue to execute on the thread provided by the given scheduler. To get back to one of the threads of the default scheduler, one can write:
On line 3, we are actually switching threads. The old thread is released back to the ioScheduler
, while a new thread from the default scheduler is used to continue the work.
Sometimes it's useful to cancel an already-running operation. Here is an example:
The above code will print working
several times (roughly 3 times), and then complete. The spawned work will stop as soon as stop_requested()
on line 3 returns true
, which will happen after request_stop()
is executed.
Dropping the handle returned by spawn
indicates that the caller is no longer interested in the completion of the spawned work, thus a stop signal will be sent. The following code behaves the same to the previous code snippet:
There is another case that involves cancellation: the spawned work is stopped without the caller knowledge. The caller still expects a value from the spawned work, and that value will never be produced; an exception will be used to signal the absence of the return value. Here is an example:
Let's discuss an example that is drawn from real-life, and that covers control flow. We are trying to build an HTTP server driver, in which we need to dispatch work when a new request comes on a new connection.
At the first glance, there is no concurrency expressed in this code. This is because the concurrency concerns are encapsulated in the called function. The three main functions here, get_request()
, handle_request()
, and send_response
are all doing concurrent work.
In this example, we've shown how we can use the most appropriate scheduler to execute a given work. We might have a scheduler for doing network I/O (reading & writing), a scheduler to perform the main logic of analysing the request, and a scheduler that is used for actually processing the request. While processing the request, we may involve other types of schedulers, and, most probably, multiple threads may be used to speed up the execution of the request. Having multiple schedulers ensures control on different parts of the application; the user is able to fine-tune the level of concurrency exposed by the application.
As with previous examples, controlling concurrency is straightforward for the user. They just need to express what concurrency constraints apply to different parts of the code; in this case by specifying which scheduler a particular code needs to run.