Introduction

Nornir is a runtime support, providing the possibility to specify requirements in terms of performance and/or power consumption on parallel applications. These requirements will be enforced by selecting an appropriate amount of resources to allocate to the application (e.g. number of cores, clock frequency, threads' mapping, etc...). The application will be monitored throughout its entire execution in order to provide such guarantees even in presence of workload fluctuations or phase changes. Nornir is written as a C++ customizable framework, allowing the user to specify its custom decision policies.


Target Applications

Before moving forward, we need to clarify which kind of applications can be controlled by using Nornir. At the moment, Nornir assumes that the most intensive part of the computations is homogeneously split between a certain number of threads. For example, if your application is running 24 threads, let's say 20 of them are performing the intensive part of the computation and they are executing the same code (on the same data or on different data elements, that doesn't matter). This model of execution is pretty common in many computational models like map reduce, task parallel computations, data parallel computations, data stream processing, thread pools and others. For example, if you take as reference the Parsec benchmark, you may find that all the applications except one follow this model. Actually, Nornir also works on unstructured applications, as far as the underlying runtime system (e.g. Intel TBB, OpenMP, FastFlow or others) keeps the computation well balanced and distributed between the threads.


From a technical standpoint, Nornir can provide different types of guarantees according to the programming framework used to write the application. Currently, we target the following types of applications:

  1. Already existing applications, written by using the FastFlow framework. This is described in FastFlow Applications section.
  2. Already existing applications, written by using any formalism/framework and instrumented with our internal instrumentation tool. This is described in Instrumented Applications section.
  3. Already existing applications, written by using any formalism/framework. This is described in External Applications section.

For applications in point 1. we will have access to many information about internal application structure, synchronization mechanisms used and other additional information. Accordingly, this is the scenario that in principle would provide you better guarantees and tradeoffs in terms of power consumption and performance.

For applications in points 1. and 2. we are able to retrieve information about the true performance of the application. For example, in the case of a video processing application, we would be able to retrieve information about the number of frames processed per second. Accordingly, it would be possible for the user to express explicit performance requirements as minimum number of frames per second that he would like to be processed by its application. Choosing solution 1. could require to rewrite the application (if not already written with FastFlow). On the other hand, choosing solution 2. just require to add few instrumentation calls to the already existing application code.

For applications in point 3. there is no additional programming effort required to the user. However, in this case we do not have access to the real performance of the application (for example in terms of frames processed per second). For this reason, we will approximate the performance by using hardware counters like the number of assembler instructions executed per time unit, floating operatins per second, etc... . Accordingly, the user cannot specify anymore his requirement in terms of real application performance (e.g. frames per second).

To summarize, going from point 3. to point 1. we increase the programming effort required to the application programmer while at the same time increasing the quality of the solutions we find as well as the detail in user requests.

In addition to this, if you still didn't start to code your application, Nornir provides its own parallel programming environment (based on an extension of FastFlow), allowing you to express structured and non structured parallelism. This programming environment is described in section Nornir Programming Environment

FastFlow Applications

If you already have an existing FastFlow farm application, you can easily convert it in order to be controlled by the Nornir runtime support. The process is pretty straightforward and involves the following steps:

  1. The classes describing the Emitter, Workers and Collector of the farm must extend nornir::AdaptiveNode instead of ff::ff_node. ATTENTION: svc_init and svc_end are now called after each rethreading (in FastFlow they were called only once at the application start and end). Accordingly, if those operations need to be performed only once, you should ensure that.
  2. If the application wants to be aware of the changes in the number of workers (e.g. to redistribute internal data), the nodes can implement the notifyRethreading virtual method.

At this point, the farm can be passed to the Nornir manager, which will take care of the execution, providing the performance and/or power consumption required, as shown in the following code snippet.

// Create emitter, worker, and collector and add to the farm 'farm'.
nornir::Parameters parameters;
parameters.contractType = CONTRACT_PERF_BANDWIDTH;
parameters.requiredBandwidth = 40;
nornir::ManagerFarm<> manager(&farm, parameters); // Create nornir manager.
manager.start(); // Start farm.
manager.join(); // Wait for farm end.
				

In this snippet, we shown how it is possible to manage an already existing farm, by requiring a minimum performance of 40 tasks processed per second. More details on the parameters can be found in the Parameters section. A full working example can be found under the ./demo folder.

Attention!

An application using Nornir needs to be run with sudo rights, since Nornir will change the clock frequency, threads' mapping, etc... . Moreover, in some cases priviledged rights are needed to read the power consumption as well.

Instrumented Applications

This part is still experimental. The code is almost ready but it needs to be further tested, cleaned and documented. If you would like to use Nornir by instrumenting your application, please contact me.

External Applications

This part is still experimental. The code is almost ready but it needs to be further tested, cleaned and documented. If you would like to use Nornir on an already existing application, please contact me.

Nornir Programming Environment

The programming environment is ready to be used but is not yet documented. If you would like to use it, please contact me.

Parameters

Different parameters can be provided to Nornir, specifying the type of requirements, the type of resources on which we would like to operate and other additional parameters for algorithm tuning. Such parameters can be specified programmatically, by setting the members of the Parameters class, or by specifying an XML file name in the constructor of the Parameters class. The most important parameters that can be specified by the user concern the requirements, and they are:

  1. powerConsumption: The maximum allowed power consumption.
  2. bandwidth: The minimum required bandwidth in terms of application elements processed per second.
  3. executionTime: The maximum required completion time.
  4. expectedTasksNumber: The number of applications elements to be processed by the application.
  5. minUtilization: The minimum allowed utilization (in the queueing theory sense), between 0 and 100 (default = 80.0).
  6. maxUtilization: The maximum allowed utilization (in the queueing theory sense), between 0 and 100 (default = 90.0).
  7. latency: The maximum latency per iteration (NOT SUPPORTED AT THE MOMENT).

If specified programmatically, such parameters must be specified under the requirements object. If specified through an XML file, they must be specified under <requirements> XML tag, as shown in the following example, where the user asks Nornir to find the most performing configuration with a maximum power consumption of 50 watts:

<?xml version="1.0" encoding="UTF-8"?>
<nornirParameters>
	<requirements>
		<bandwidth>MAX</bandwidth>
		<powerConsumption>50</powerConsumption>
	</requirements>
</nornirParameters>

Other important parameters that can be specified by the user are:

  1. knobCoresEnabled: Allows Nornir to find the best amount of cores allocated to the application (default = true).
  2. knobMappingEnabled: Allows Nornir to find the best allocation of threads on cores (default = true).
  3. knobFrequencyEnabled: Allows Nornir to find the best clock frequency (default = true).
  4. knobHyperthreadingEnabled: Allows Nornir to find the best hypetrhreading level (default = false).

There exist other additional parameters but they are mostly used for performance tuning and debugging. Accordingly, they will not be documented at the moment.

Customization

Nornir can be customized by adding to it different decision and prediction strategies. This can be done by extending some abstract class and defining the appropriate member functions. All the infrastructure to monitor the system and to apply the decisions is already provided by Nornir, so you can focus only on the algoritmic parts of the decision strategy.

This feature is ready to be used but not yet documented. If you would like to add custom decision and/or prediction policies to Nornir, please contact me.

References

Nornir has been used in the last years to prove the efficiency and accuracy of different algorithms. The reference article is:

On this article you can find some details about how Nornir works under the hood.



Additional information about different parts of Nornir can be found on the following papers: