65 lines
3.6 KiB
Plaintext
Executable File
65 lines
3.6 KiB
Plaintext
Executable File
Tag matching logic
|
||
|
||
The MPI standard defines a set of rules, known as tag-matching, for matching
|
||
source send operations to destination receives. The following parameters must
|
||
match the following source and destination parameters:
|
||
* Communicator
|
||
* User tag - wild card may be specified by the receiver
|
||
* Source rank – wild car may be specified by the receiver
|
||
* Destination rank – wild
|
||
The ordering rules require that when more than one pair of send and receive
|
||
message envelopes may match, the pair that includes the earliest posted-send
|
||
and the earliest posted-receive is the pair that must be used to satisfy the
|
||
matching operation. However, this doesn’t imply that tags are consumed in
|
||
the order they are created, e.g., a later generated tag may be consumed, if
|
||
earlier tags can’t be used to satisfy the matching rules.
|
||
|
||
When a message is sent from the sender to the receiver, the communication
|
||
library may attempt to process the operation either after or before the
|
||
corresponding matching receive is posted. If a matching receive is posted,
|
||
this is an expected message, otherwise it is called an unexpected message.
|
||
Implementations frequently use different matching schemes for these two
|
||
different matching instances.
|
||
|
||
To keep MPI library memory footprint down, MPI implementations typically use
|
||
two different protocols for this purpose:
|
||
|
||
1. The Eager protocol- the complete message is sent when the send is
|
||
processed by the sender. A completion send is received in the send_cq
|
||
notifying that the buffer can be reused.
|
||
|
||
2. The Rendezvous Protocol - the sender sends the tag-matching header,
|
||
and perhaps a portion of data when first notifying the receiver. When the
|
||
corresponding buffer is posted, the responder will use the information from
|
||
the header to initiate an RDMA READ operation directly to the matching buffer.
|
||
A fin message needs to be received in order for the buffer to be reused.
|
||
|
||
Tag matching implementation
|
||
|
||
There are two types of matching objects used, the posted receive list and the
|
||
unexpected message list. The application posts receive buffers through calls
|
||
to the MPI receive routines in the posted receive list and posts send messages
|
||
using the MPI send routines. The head of the posted receive list may be
|
||
maintained by the hardware, with the software expected to shadow this list.
|
||
|
||
When send is initiated and arrives at the receive side, if there is no
|
||
pre-posted receive for this arriving message, it is passed to the software and
|
||
placed in the unexpected message list. Otherwise the match is processed,
|
||
including rendezvous processing, if appropriate, delivering the data to the
|
||
specified receive buffer. This allows overlapping receive-side MPI tag
|
||
matching with computation.
|
||
|
||
When a receive-message is posted, the communication library will first check
|
||
the software unexpected message list for a matching receive. If a match is
|
||
found, data is delivered to the user buffer, using a software controlled
|
||
protocol. The UCX implementation uses either an eager or rendezvous protocol,
|
||
depending on data size. If no match is found, the entire pre-posted receive
|
||
list is maintained by the hardware, and there is space to add one more
|
||
pre-posted receive to this list, this receive is passed to the hardware.
|
||
Software is expected to shadow this list, to help with processing MPI cancel
|
||
operations. In addition, because hardware and software are not expected to be
|
||
tightly synchronized with respect to the tag-matching operation, this shadow
|
||
list is used to detect the case that a pre-posted receive is passed to the
|
||
hardware, as the matching unexpected message is being passed from the hardware
|
||
to the software.
|