Direct Memory Access (DMA) I/O
The problem still remains; interrupts at different stages are required: more for PIO and relatively less for highly efficient interrupt-driven I/O, and processing an interrupt done only by CPU is, however, expensive. In addition, for both the cases, all data transfers also involve CPU and must be routed only through the CPU. As a result, the costly CPU time is badly wasted that causes an adverse impact on CPU activity, and also limits the I/O transfer rate. A befitting mechanism is thus needed that would target to ultimately relieve and release the CPU to a large extent from this hazardous time-consuming I/O-related activities, and that can be accomplished by simply letting the peripheral devices themselves to directly manage the memory buses without the intervention of the CPU. This would definitely improve not only the speed of data transfer but also the overall performance of the system, as the CPU is now released to carry out its own useful work.
With the development of hardware technology, the I/O device (or its controller) can be equipped with the ability to directly transfer a block of information to or from main memory without CPU's intervention. This requires that the I/O device (or its controller) is capable of generating memory addresses and transferring data to or from the system bus: i.e. it then must be a bus master. The CPU is still responsible for initiating each block transfer, and the I/O device controller can then carry out the physical data transfer without further program execution by the CPU. The CPU and I/O controller interact only when the CPU must yield control of the system bus to the I/O controller in response to a request issued from the controller. This request is in the form of an interrupt, to be serviced by the CPU. The CPU is now involved to process the interrupt. After servicing the interrupt, the CPU is no longer there, and it can then go back to resume the execution of its previously ongoing executing program. The access of the bus is now transferred and is now rest with the controller (requesting device) which then completes the required number of cycles for the data transfer, and then again hands over the control of the bus back to CPU. This type of I/O capability is called direct memory access (DMA). For large volumes of data transfer, DMA technique is found much faster than if it is carried out by the CPU, and is observed to be adequately efficient. To implement DMA and interrupt facilities, most computers may then require the system's I/O interface to contain special DMA and interrupt control units.
Figure 5.5 shows a block diagram indicating how the DMA mechanism works.
- • The I/O device is connected to the system bus via a special interface circuit called a DMA controller;
- • The DMA controller contains a data buffer register IODR (input-output data register) to temporarily store the data, just like in the case of PIO. In addition, there is an address register IOAR (input-output address register), and a data count register DC;
- • The IOAR is used to store the address of the next word to be transferred. It is automatically incremented after each word transfer;
- • The DATA count register DC stores the number of words that remain to be transferred. It is automatically decremented after each transfer and tested for zero. When the data count reaches zero, the DMA transfer halts;
- • These registers allow the DMA controller to transfer data to and from a contiguous region of main memory.
Schematic diagram of DMA I/O.
The controller is normally provided with an interrupt capability, and in this situation, it sends interrupts to the CPU to signal the beginning and end of the data transfer. The logic necessary to control the DMA activities can easily be placed in a single integrated circuit. DMA controllers are available that can supervise DMA transfers involving several I/O devices, each with a different priority of access to the system bus.
When the CPU intends to receive or send (read or write) a block of data, it issues a request to the DMA module with certain information, and the DMA transfer operation then proceeds as follows:
- • The read or write control line between the processor and the DMA module is used by the CPU to intimate the DMA whether a read or write is requested;
- • The identification (address) of the I/O device to be involved is intimated by the CPU to the DMA, and is communicated on the data line;
- • The CPU executes two I/O instructions, which load the DMA registers: the IOAR with the base address of the main-memory region to be used in the data transfer and the DC with the number of words to be transferred to or from this region;
- • When the DMA controller is ready to transmit or receive data, it activates the DMA REQUEST line to the CPU. The CPU then activates DMA ACKNOWLEDGE and there after relinquishes control of the data and address lines, and gets back its own work. The CPU now waits for the next DMA breakpoint. In fact, DMA REQUEST and DMA ACKNOWLEDGE are essentially BUS REQUEST and BUS GRANT lines for the system bus. Simultaneous DMA requests from several DMA controllers, if any, can be resolved by using one of the bus-priority control techniques;
- • The DMA controller now transfers data directly to or from the main memory. After a word is transferred, IOAR and DC are incremented and decremented, respectively;
- • If the DC is decremented to zero, the DMA controller again relinquishes control of the system bus. It may also send an interrupt signal to the CPU, and the CPU responds either by halting the I/O device or by initiating a new DMA transfer;
- • If the DC is not decremented to zero, but the I/O device is not ready to send or receive the next batch of data, the DMA controller returns control to the CPU by releasing the system bus and deactivating the DMA REQUEST line. The CPU responds by deactivating DMA ACKNOWLEDGE and resumes normal operation.
Intel 8257 chip supports four DMA channels, by which four peripheral devices can independently request for DMA data transfer at a time. The DMA controller has 8-bit internal data buffer, a read/write unit, a control unit, a priority-resolving unit along with a set of registers. Intel 8237 critically differs architecturally from 8257 and provides a better performance compared to 8257. It is an advanced programmable DMA controller capable of transferring a byte or a bulk of data between system memory and peripherals in either direction. Memory-to-memory data transfer facility is also available in this chip. This DMA controller can be interfaced to the processor family 80x86 with DRAM memory. Similar to 8257, the 8237 also supports four independent DMA channels (numbered 0, 1, 2, and 3), which may be expanded to any number, by cascading more number of 8237.
Here, each channel can be programmed independently, and any one of the channels may be made individually active at any point of time. But the distinctive feature of this chip is that it provides many programmable controls and dynamic reconfigurability attributes, which eventually enhance the data transfer rate of the system remarkably.
Many CPUs like those of the MC 68000 series have no internal mechanism for resolving multiple DMA requests-, this must be done by external logic. The DMA controller 68450 chip contains four copies of the basic DMA controller logic that enables the 68450 to carry out a sequence of DMA block transfers without reference to the CPU. When the current data count reaches zero, a DMA channel that has been programmed for chained DMA transfer (as mentioned in Bullet 4) fetches the new value of DC and IOAR from a memory region (MR) that stores a set of DC-IOAR pairs. A special memory address register in every DMA channel holds the base address of MR.
DMA is subsequently accepted as a standard approach, commonly used in all personal computers, minicomputers, and mainframes for carrying out I/O activities.
Different Transfer Types
Under DMA control, data can be transferred in one of the several following ways:
- 1. DMA block transfer: This type transfers a sequence of data word of arbitrary length in a single continuous burst when the DMA controller is the master of the system bus. Block DMA supports the maximum I/O data transmission rate, but it may cause the CPU to remain inactive for relatively longer periods. Auto-initialization may be programmed in this mode. This DMA mode is particularly required by secondary memory devices like magnetic disk drives where data transmission cannot be stopped or slowed down without loss of data, and block transfers are the norm.
- 2. Cycle stealing: This approach allows the DMA controller to use the system bus interspersed with CPU bus transactions while transferring long blocks of I/O data by a sequence of DMA bus transactions. During these cycles, the CPU will have to wait to get the control of the bus because DMA always has a higher bus priority than the CPU, as I/O devices cannot tolerate delays. The process of taking bus cycles away from the CPU by a DMA controller, or by way of forcing the processor to temporarily suspend its operation, is called cycle stealing. Cycle stealing not only reduces the maximum I/O transfer rate, but also reduces the interference by the DMA controller in the CPU's activities. It is possible to completely eliminate the interference by designing the DMA interface so that the bus cycles are to be stolen only when the CPU is not actually using the system bus. This is known as transparent DMA. Thus, by varying the degrees of overlap between CPU and DMA operations, it is possible to accommodate many I/O devices having different data- transfer characteristics.
- 3. Demand transfer: In this mode, the device continues transfers until DC (count) is reached zero, or an external condition (end of process) is detected, or DMA REQUEST signal goes inactive.
- 4. Cascading: In this mode, more than one 8237 can be connected level-wise together to the host 8237 to provide more than four DMA channels. The priorities of the DMA requests, however, may be preserved at each level.
Implementation Mechanisms: Different Approaches
The DMA mechanisms can be implemented in a variety of ways.
- 1. Here, the DMA module and the I/O devices individually share the system bus with the CPU and memory as shown in Figure 5.6. The DMA module is acting here as a surrogate (a substitute) for the CPU. The DMA controller uses PIO to exchange data between memory and an I/O device. Like the CPU-controlled PIO, this approach also requires two bus cycles for each transfer of a word. This configuration, while it may be looked inexpensive, is clearly inefficient also;
- 2. This drawback of consuming more bus cycles at the time of data transfer can be reduced substantially if the DMA and I/O functions are integrated. This means that there is a separate path between the DMA module and one or more I/O modules that does not include system bus. This is shown in Figure 5.7. Here, the DMA logic may be a part of an I/O module or may be a completely separate module that controls one or more I/O modules;
- 3. The approach already mentioned in (2) can be modified one step further by connecting I/O modules to the DMA module using an I/O bus. The transfer of data between the DMA and I/O modules can then take place off the system bus and the system bus will be used by the DMA module only at the time of exchanging data with the memory. This approach will reduce the number of I/O interfaces in the DMA module to one, and at the same time offers an easily expandable configuration. Figure 5.8 shows a schematic design of this approach.
Single bus: DMA detached from 1/0.
Single bus: DMA-I/O integrated.
DMA-I/O with separate I/O bus.
I/O Processor (I/O Channels)
While the introduction of a DMA controller in the I/O nodule is a radical breakthrough, it is after all not able to totally freeing the CPU. Moreover, DMA sometimes uses many bus cycles at a time, as in the case of a disk I/O, and during these cycles, the CPU will have to wait for bus access (as DMA always has a higher bus priority) that summarily restricts the performance to attain the desired level. Further development is thus targeted in quest of an enhanced I/O module so that this modified I/O module could control the entire I/O operation on its own, setting the CPU totally aside. This type of I/O module that almost fully relieves the CPU from the burden of I/O execution is often referred to as an I/O channel. The final and ultimate approach is then to convert this I/O channel to a full-fledged processor so that the CPU can now be relieved almost totally. This is accomplished by including a local memory to this I/O channel so that it can manage a large set of different devices with minimal or almost no involvement of the CPU. This module then basically consists of a local memory attached with a specialized processor and includes I/O devices. It shows that this unit as a whole then itself becomes a stand-alone computer. An I/O module having this kind of architecture is known as an I/O processor (IOP). An IOP can perform several independent data transfers between main memory and one or more I/O devices without recourse to the CPU. Usually, an IOP is connected to the devices it controls, by a separate bus system called the I/O bus or I/O interface. It is not uncommon for larger systems to use small computers as IOPs which are primarily communication links between I/O devices and main memory, and hence the use of the term channel for IOP. The IOPs are also called peripheral processing units (PPUs) to emphasize their subsidiary roles with respect to the CPU.
A channel or IOP is essentially a dedicated computer with its own instruction set processor to independently carry out entire I/O operations along with other processing tasks, such as arithmetic, logic, branching, and code translation required mostly for I/O processing. The CPU only initiates an I/O transfer by instructing the I/O channel to execute a specific program available in main memory, and then the CPU goes off. This program will indicate the device or devices to be taken, the area or areas of memory for storage, the priority, and the different types of actions to be taken in case of certain error conditions. The I/O channel uses this information and executes the entire I/O data processing, while CPU is fully devoted to its own work in parallel. When I/O activity is over, the channel interrupts the CPU and sends only all related necessary information. Traditionally, the use of I/O channels has been observed to be associated with mainframe or large-scale system attached with a large number of peripherals
(disks and tape storage devices), which are used simultaneously by many different users in multitasking as well as in on-line transaction processing (OLTP) environment handling bulk volume of data. As the development of chip-based microprocessors has dramatically progressed, the use of I/O channels has now extended to minicomputers and even to microcomputers. However, the fully developed I/O channel is best studied on the mainframe system, and possibly the best-known example in this regard is the flagship IBM/370 system.
The IOP in the IBM 370 system is commonly called a channel. A typical computer system may have a number of channels, and each channel may be attached to one or more similar or different types of I/O devices through I/O modules. Three types of channels are in common use: a selector channel, a multiplexor channel, and a block multiplexor channel (a hybrid of features of both the multiplexor and selector channels). The interface being used from an I/O module (channel) to a device (i.e. a device controller along with a device) is either a serial or a parallel. Although a parallel interface is traditionally a common choice for high-speed devices, but with the emergence of next-generation advanced high-speed serial interfaces, parallel interfaces have eventually lost their inherent importance to a considerable extent, and hence, are found to be much less common. However, the I/O channel is best implemented in the mainframe system, and possibly the well-known example in this regard is the flagship IBM/370 system.
A brief detail of I/O channel along with its different types, and its implementation in IBM/370 system, is given with figure in the website: http://routledge.com/9780367255732.
I/O Processor (IOP) And Its Organisation
The I/O channel has been finally promoted to a full-fledged IOP using a mechanism (already described in the "Introduction", Section 5.3.4) to make the CPU almost totally free from any I/O activities. The handshaking between the CPU and IOP at the time of establishing communications may take different forms depending mostly on the particular configuration of the computer being used. However, the memory unit in most cases acts as a mediator providing message centre (input-output communication region (IOCR)) facilities where each processor leaves some information for the IOP to follow. This is one form of indirect handshaking. The direct handshaking between CPU and IOP is generally done through dedicated control lines. Standard DMA or bus grant/acknowledge lines are also used for arbitration of the system bus between these two processors. However, Figure 5.9 illustrates here a schematic block diagram of a representative system containing an IOP. A sequence of steps is then required to be followed at the time of CPU-IOP interaction and communication for needed information exchange.
During IOP operation, the CPU is free and executes its own tasks with other programs. There may be a situation when the CPU and the IOP both compete with each other to get simultaneous memory access, and hence, the IOP is often restricted to have only a limited number of devices so that the number of memory accesses can be minimized. In the case of operation of a slower device, this may even lead to a situation of memory-access saturation, since I/O operations use DMA, and the CPU may then have to wait during this transfer, which may cause a notable degradation in CPU performance.
All the Intel CPUs have explicit I/O instructions to read or write bytes, words, or longs. These instructions specify the I/O port number desired, either directly as a field with the
Block diagram of a representative system containing an IOP.
instruction or indirectly using the register DX to hold it. In addition, of course, DMA chips (Intel 8257/8237) are frequently used to relieve the CPU from handling I/O burden. None of the Motorola chips have I/O instructions. It is expected that the I/O device register will be addressed via memory mapping. Here too, DMA is widely used.
A brief detail of IOP along with its working is given with figure in the website: http://routledge.com/9780367255732.