BBC Russian

Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

Exploratory Group - Embedded Heterogeneous Communication

For Software Engineers Who are NOT Focused on Communication, but Require It

Khronos is considering an open standardization initiative to unify point-to-point communication into a simple API with the aim of reducing application complexity, minimizing development costs, and improving time-to-market for high- performance embedded products. If successful, this new standard could transform the way applications are developed for heterogeneous systems and edge computing.

Current Issues With Existing APIs

No Single API Works With All Endpoint Variations

Edge computing applications typically need multiple point-to- point APIs to collect various types of sensor data, and distribute computing across processors, processes and threads. Development and maintenance are not trivial.

SOLUTION

A unified communication API should fit all endpoint variations.

Existing APIs are Not Simple or Intuitive

Engineers who are focused on algorithm development, but not communication, struggle with high learning curves of communication APIs. Setting up communication endpoints and invoking efficient transfers can take weeks or months to get it right, but should take hours or days.

SOLUTION

The functionality of a unified communication API should only includes a few simple and intuitive concepts:

Create / Destroy Endpoint
Send --> Receive
Read / Write

Common APIs	Function Count	Typical Drawbacks
Sockets	~20	Confusing naming/ concepts Difficult to tune for performance
RDMA	~100	Confusing naming / concepts High learning curve, experts are rare
libFabrics	~100	Confusing naming / concepts High learning curve, experts are rare
MPI	~300	Confusing transfer variations ‘mpirun’ not portable and difficult to tune

Existing APIs are Missing Critical Features

Embedded / edge computing has much different requirements than large homogeneous server farm computing. Lack of features may severely limit the choice of API, which may in turn force a much larger learning curve. For example:

MPI at face value is very simple, but if the engineer is designing for real-time (needs determinism) and must allow for dropped data (UDP-like protocol), then MPI can't be used, and the engineer may be forced to use the very complex RDMA.
Sockets can't pre-register memory addresses at creation time so a TCP sender, of a large message, will need to block until the TCP receiver gets called.

SOLUTION

A unified communication API should support these critical features:

Reliable transfers (every byte matters)
Unreliable transfers (allow dropping data)
Fault tolerant hooks to allow application to be fault tolerant (timeouts, disconnect detection, create endpoints on the fly)
All endpoint localities: inter-thread, inter-process, inter-device
All hardware: CPUs, GPUs, FPGAs, etc.
Zero-copy and one-way (not just one or the other)
Non blocking (so CPU is not bogged down waiting)
Two-sided transfers (a coordination between send and recv)
One-sided transfers (read/write/atomics) where the remote endpoint is not involved
Multiple memory blocks per message (mix CPU and GPU)
Connect to 3rd party endpoints

Some underlying interconnects may not support all the features, but the API should not limit the interconnects that can.

Unrealized Performance

Performance is a combination of throughput, latency, and determinism. Some communication APIs may reduce performance provided by the underlying interconnect. For example, if the destination address of data is not know until transfer time, then the transfer may be zero- copy but it can't be one way; i.e. a round trip is required to know where to place the data.

SOLUTION

A unified API should provide best performance via:

Zero-copy AND one-way: when possible, data addresses need to be pre-registered (and pinned) when the endpoint is initialized
Non-blocking: when possible, transfers should be unloaded to a DMA engine that can free up main processing resources
Minimize implicit activity at transfer time: The application should be responsible for synchronization of memory buffer use

Proposals

Proposals are welcome – please contact us at .(JavaScript must be enabled to view this email address) if you would like to discuss getting involved.

Only 8 Function Calls!

Group	Functions	Details
Create	takyonCreate() takyonDestroy()	Dynamically create and destroy endpoints
Two-Sided	takyonSend() takyonIsSent() takyonPostRecvs() takyonIsRecved()	Both endpoints involved with transfer via coordinated send -> recv
One-Sided	takyonOneSided() takyonIsOneSidedDone()	One endpoint does all the work, and the other endpoint is not involved

100% Open Source

GitHub Presentation takyon.h

Recently Adopted By

Lockheed Martin
Ametek / Abaco Systems

Related Discussions

Related Press

More Press Releases

Exploratory Group - Embedded Heterogeneous Communication

For Software Engineers Who are NOT Focused on Communication, but Require It

Current Issues With Existing APIs

No Single API Works With All Endpoint Variations

Existing APIs are Not Simple or Intuitive

Existing APIs are Missing Critical Features

Unrealized Performance

Proposals

Only 8 Function Calls!

100% Open Source

Recently Adopted By

Related Discussions

Related News

Related Press