The Snorkel Mark 1: 
A Programmable DMA Channel

This paper describes a simple SRAM master that moves arbitrary sequences of 18-bit or 16-bit data between one of its COM ports and specified areas of the external SRAM. This is done by a simple, software-defined special purpose processor that executes a stored program from SRAM to define the sequence of transfers.

The problem and its solution are not new. In fact, very similar Programmable DMA Channels were part of the hardware in mainframe computers of the 1960's. In later minicomputer and microprocessor designs, the DMA channels lost the ability to execute stored programs and instead single transfers were controlled with registers and main CPU software. More recently this problem is addressed by giving DMA capability to each controller attached to a common system bus such as PCI. However, the Programmable DMA Channel remains a very useful tool, and one of the nicer things about the GreenArrays architecture is the ability to implement such devices in software.

This module illustrates a number of effective techniques for minimizing the size of the F18 code; indeed, it was necessary to employ them in order to fit this functionality into only 64 words. Anyone interested in learning to program our chips well should benefit greatly from a careful study of this software.
1. Problem Statement

Many applications require transactions that involve one or more movements of data between external memory and some destination or device. The transfers may include a single datum or an arbitrarily large block of data. It may be necessary to make multiple transfers for each transaction; for example, a simple SCSI operation requires sending a packet of data representing a command structure, then sending or receiving one or more blocks of data, and finally receiving a status structure. It would in many cases be inconvenient to make all of these subsidiary transfers to or from a single contiguous area of memory, even if the data direction were the same for all of them. Further, in our systems with 18-bit internal operations, some transfers will need to be moving 18-bit data to and from external memory, while others that terminate in structures or protocols based on 16-bit data will need to transfer to and from external memory in those units and alignments.

It is easy to accomplish this using programmed I/O. However, this cannot be done at memory speed in a high level external virtual machine environment, and is even slower when multiprogramming response times prevent a program from simply bursting an entire transfer in or out uninterrupted. Making transfers at memory speed generally requires something nearer the hardware than programmed I/O. We call this mechanism a Snorkel, suggesting a means whereby data or software immersed in the external SRAM may reach the "fresh air" in our fabric of high speed F18 nodes.

2. Our Solution

In the 1960s, many mainframe computers addressed this problem with Programmable DMA Channels. The terms varied but the basic idea was to build a hardware device that acted as a memory master and could transfer data at memory speed. The simplest just transferred one block of words or bytes in a given direction and required program intervention to transfer another. The more powerful executed what some called a channel program. In all cases the work of transferring data between memory and devices was offloaded from general purpose CPUs to special-purpose hardware that was optimized for making such transfers efficiently with minimal program intervention.

The speed difference between an F18 node and the external SRAM is great enough that we have the luxury of creating in software simple "devices" that fulfill roles which would otherwise require custom hardware. Because this is easy to do, there is no need to solve hypothetical future problems with current work. In general, solving only the problem at hand conserves all relevant resources, such as time, space, energy and programming labor. In addition there is no need to compose entirely hypothetical test cases.

The Snorkel is a simple, special purpose processor that executes stored Snorkel programs from external SRAM which tell it to perform an arbitrary sequence of 18- or 16-bit transfers to or from arbitrary areas of external memory. It occupies one of the SRAM master nodes and moves data between SRAM and any of the COM ports of the Snorkel node.
3. Implementation

This program, taking exactly 64 words of F18 RAM, implements our Snorkel in any SRAM master node shown above.

The program is entered at idle with b initialized to the address of the port leading to node 107. For example, to load this program (note that its object code is in bin 1605) into node 207, the following descriptor suffices:

```
  snorkel mk1  207 +node 1605 /ram up /b 37 /p
```

On boot, and after completing each program, this node awaits with @b a stimulus from the SRAM interface. To start a program, some other entity writes the 16-bit cell address of a Snorkel program into a cell mmptr (an agreed-upon address in external SRAM, 4 in this case) and then sends a stimulus to the Snorkel through the SRAM interface. The node swaps a zero into mmptr indicating that it has retrieved the pointer, which it then uses to fetch and execute Snorkel Program instructions. Execution continues until a fin instruction is encountered, whereupon the node stores zero where the fin instruction came from to show completion, and sends a stimulus to another node (in this case node 106 of the Virtual Machine, determined by the value x8000 passed to mk' in fin) to awaken it in case the node was able to suspend while waiting for Snorkel completion. The Snorkel program structure and instruction set is as follows:

<table>
<thead>
<tr>
<th>15</th>
<th>14</th>
<th>13</th>
<th>12</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>8</th>
<th>7</th>
<th>6</th>
<th>5</th>
<th>4</th>
<th>3</th>
<th>2</th>
<th>1</th>
<th>0</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>hi</td>
<td>low</td>
<td>Address of port to use (occurs once) followed by one or more 5-cell instructions structured thusly:</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>Opcode: Addr of F18 routine</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>hi</td>
<td>low</td>
<td>transfer size (words=1 thru port) 18-bit count, max 262144 words</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>hi</td>
<td>low</td>
<td>20-bit SRAM address for transfer</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Decoding is simple; after reading an op-code from external memory, we simply jump to that address in internal RAM using the sequence push ; at the end of idle. We use this technique in many ways. Another technique example is the calling of dma to run a loop that invokes the code following the call each time through the loop. Such techniques reduce code size and make it feasible to express programs like this one in only 64 words.
4. Usage Examples

polyFORTH mechanism for operating a Snorkel in node 207 begins with that required for SPI flash I/O in the nucleus:

```
2496 0 Base mechanism for I/O done by memory mastering nodes.
1 Masters 0 and 1 are nodes 108 and 207 respectively.
3 Set zero after master fetches it.
5 MMASK is the currently active mask in memory interface node.
6 |MMASK| sets master mask
7 |MMSTIM| sends stimulus to each master whose bit is set.
8 |SUSPEND| suspends VM execution until stimulated.
9 +SNORK| starts a snorkel program, returning complete flag addr.
10 HOME unmasks the snorkel so it may receive stimuli.
11 sDONE waits till the given completion flag says done.
12 |RELOAD| adds code to the reload chain. Usage:
13 'reload is a chain of executable code that returns when done.
14 sDONE ('fin) begins |RELOAD| when done.
15 HERE (Snorkel/Ganglia) HEX
```

The key functions are +SNORK and sDONE for starting Snorkel programs and waiting until they have completed. It's necessary to pass the address of each program's fin instruction to both of these words because fin is used not only as an op-code but as a flag.

```
+SNORK (^pgm ^fin - ^fin) spins until the preceding Snorkel program start address has been taken by the Snorkel. It then restores the fin op-code value in the program, sets its start address, and posts a stimulus for the Snorkel through the memory controller. If the Snorkel is executing another program at that time, the stimulus is preserved until the Snorkel node returns to its idle routine.

sDONE (^fin) spins until the given Snorkel program's fin instruction has been posted to zero.
```

With this logic, multiple tasks may use the mechanism without interference so long as they aren't talking to the same destination in such a way as to depend on destination state between transactions, or using the same Snorkel program or buffer areas in memory. In cases like those, it is up to the application to use appropriate facility reservation methods such as GET / RELEASE.

Some useful constants are defined here but others must wait, for space considerations, till block 28 has been loaded:

```
2428 0 Tools for extending the memory mastering (Snorkel/Ganglia) I/O
11 that's in the nucleus.
12 :UP and :RIGHT are port adrs for starting channel pr
13 :DOWN adds code to the reload chain. Usage:
14 HERE [stuff to do before existing chain] +RELOAD [ ]
15
```

The full set of constants is then:

```
:DOWN :RIGHT :LEFT Port addresses (calls) used to start programs for a Snorkel in node 207
:UP :RIGHT Port addresses usable to start programs for a Snorkel in node 108
016 116 Op-codes for transfers out of and into external SRAM as 16-bit data in RAM and as bits 15..0 of words transferred through the port. Count is number of SRAM words -1.
018 118 Op-codes for transfers of 18-bit data. Each 18-bit datum is stored in external memory as a polyFORTH double number (most significant cell first). Count is number of double numbers -1.
```

FIN Op-code for terminating a Snorkel program. The following four cells are ignored. The cell containing FIN is set to zero when FIN is executed.
Here are two examples of Snorkel programs; the first is from an application, the second is from the nucleus:

In the first example, pgm> is a Snorkel program. It uses the down port of the Snorkel node, so the transfers will be done between external SRAM and the down port of node 207. This means that the immediate "target" of the transfers will be node 307, communicating with that node through its down port. The first Snorkel instruction transfers six 18-bit values outward, into node 307, taken from 36 16-bit cells of SRAM starting at an address that was passed into the definition on the stack. The second instruction transfers eight 16-bit values inward from node 307 to eight cells of SRAM starting at the address digest. Finally the FIN instruction ends the program; HERE places its address on the compiler stack so that it may be used as the LITERAL in the definition MDS> which executes this Snorkel program and waits for it to complete. Note the use of Z, W, and IN to lay down double and single precision numbers as 18- or 20-bit values in the Snorkel program. These words are only available after HI.

The second example shows a more complex Snorkel program at s10. It too uses node 307 as its immediate "target", performing four data transfers: An outbound transfer of /sGH 18-bit values starting at sGH; an outbound transfer of a variable number of 16-bit values from a variable address; an inbound transfer of a variable number of 16-bit values to a variable address; and an inbound transfer of one 16-bit value to SRAM at the address spisSTAT. Subsequent definitions in this block are used to set variable counts and addresses before executing the Snorkel program. Note that in the nucleus we use :down which is headless to conserve space.

All these examples are in context of Ganglion transactions, but the Snorkel is not in any way limited by Ganglion conventions. Data may be moved between memory and node adjacent to the Snorkel node for any purpose and using any protocol that has been mutually agreed upon with the adjacent node in question. Obviously, if the adjacent node is executing the port through which data are transferred to it at the time the transfer is made, then at least the first word transferred will be interpreted as an F18 instruction word.

4.1 Special Considerations

The high-order two bits of data transferred through ports on an o16 operation will be zero on the EVB001 board because D16 and D17 pins are pulled down. These bits are not masked by the Mark 1 SRAM Cluster; if you have laid out a board you must be aware that any signals on those pins are visible to SRAM masters and in the case of the Snorkel will be moved through the port on an o16 operation.
IMPORTANT NOTICE

GreenArrays Incorporated (GAI) reserves the right to make corrections, modifications, enhancements, improvements, and other changes to its products and services at any time and to discontinue any product or service without notice. Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete. All products are sold subject to GAI’s terms and conditions of sale supplied at the time of order acknowledgment.

GAI disclaims any express or implied warranty relating to the sale and/or use of GAI products, including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright, or other intellectual property right.

GAI assumes no liability for applications assistance or customer product design. Customers are responsible for their products and applications using GAI components. To minimize the risks associated with customer products and applications, customers should provide adequate design and operating safeguards.

GAI does not warrant or represent that any license, either express or implied, is granted under any GAI patent right, copyright, mask work right, or other GAI intellectual property right relating to any combination, machine, or process in which GAI products or services are used. Information published by GAI regarding third-party products or services does not constitute a license from GAI to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property of the third party, or a license from GAI under the patents or other intellectual property of GAI.

Reproduction of GAI information in GAI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties, conditions, limitations, and notices. Reproduction of this information with alteration is an unfair and deceptive business practice. GAI is not responsible or liable for such altered documentation. Information of third parties may be subject to additional restrictions.

Resale of GAI products or services with statements different from or beyond the parameters stated by GAI for that product or service voids all express and any implied warranties for the associated GAI product or service and is an unfair and deceptive business practice. GAI is not responsible or liable for any such statements.

GAI products are not authorized for use in safety-critical applications (such as life support) where a failure of the GAI product would reasonably be expected to cause severe personal injury or death, unless officers of the parties have executed an agreement specifically governing such use. Buyers represent that they have all necessary expertise in the safety and regulatory ramifications of their applications, and acknowledge and agree that they are solely responsible for all legal, regulatory and safety-related requirements concerning their products and any use of GAI products in such safety-critical applications, notwithstanding any applications-related information or support that may be provided by GAI. Further, Buyers must fully indemnify GAI and its representatives against any damages arising out of the use of GAI products in such safety-critical applications.

GAI products are neither designed nor intended for use in military/aerospace applications or environments unless the GAI products are specifically designated by GAI as military-grade or “enhanced plastic.” Only products designated by GAI as military-grade meet military specifications. Buyers acknowledge and agree that any such use of GAI products which GAI has not designated as military-grade is solely at the Buyer’s risk, and that they are solely responsible for compliance with all legal and regulatory requirements in connection with such use.

GAI products are neither designed nor intended for use in automotive applications or environments unless the specific GAI products are designated by GAI as compliant with ISO/TS 16949 requirements. Buyers acknowledge and agree that, if they use any non-designated products in automotive applications, GAI will not be responsible for any failure to meet such requirements.

The following are trademarks or registered trademarks of GreenArrays, Inc., a Nevada Corporation: GreenArrays, GreenArray Chips, arrayForth, and the GreenArrays logo. polyFORTH is a registered trademark of FORTH, Inc. (www.forth.com) and is used by permission. All other trademarks or registered trademarks are the property of their respective owners.

For current information on GreenArrays products and application solutions, see www.GreenArrayChips.com

Mailing Address: GreenArrays, Inc., 774 Mays Blvd #10 PMB 320, Incline Village, Nevada 89451
Printed in the United States of America
Phone (775) 298-4748 fax (775) 548-8547 email Sales@GreenArrayChips.com
Copyright © 2010-2013, GreenArrays, Incorporated