| || ||List the benefits of secondary storage|
| || ||Identify storage media available for personal computers|
| || ||Differentiate the principal types of secondary storage|
| || ||Describe how data is stored on a disk|
| || ||Understand and appreciate the benefits of multimedia|
| || ||Understand how data is organized, accessed, and processed|
Cheri Urquhart, Tate Kirschner, and Taylor Russell met in college, where they were studying to become architects. They did both undergraduate and graduate work together and then went their separate ways into the workforce. But they remained in the same city and kept in touch.
Seven years later, at a professional conference, their casual conversation over dinner turned serious and they began to consider forming their own architectural firm. The details of accomplishing this were complex and involved many months of planning. Our concern here is what they decided to do about computers and, particularly, computer storage.
Architectural drawings are made with special software; the software alone takes up many millions of bytes of storage. In addition, the architectural drawings themselves are storage hogs. The three architects did not hesitate to include hard disk with many gigabytes of storage.
Another problem was the need to be able to produce computer-generated "walk-through" movies, a simulated tour to show their clients the planned structure. For this, they chose DVD-ROM, a type of high-capacity storage disk that can hold a full-length movie with room to spare.
Further, the architects each had computers at home. They thought they should upgrade the storage capacity of their individual computers so that they could bring work home. They also wanted some sort of transfer storage device, so they could bring drawings on disk between home and office; they settled on the Zip drive, which holds a high-capacity diskette.
The issues just described are more complicated than most people face. However, it is true that disk storage is an ongoing issue for most users--we can never seem to get enough. A rule of thumb among computer professionals is to estimate disk needs generously and then double that amount. But estimating future needs is rarely easy.
Picture, if you can, how many filing-cabinet drawers would be required to hold the millions of files of, say, tax records kept by the Internal Revenue Service or historical employee records kept by General Motors. The record storage rooms would have to be enormous. Computers, in contrast, permit storage on tape or disk in extremely compressed form. Storage capacity is unquestionably one of the most valuable assets of the computer.
||The Benefits of Secondary Storage|
Secondary storage, sometimes called auxiliary storage, is storage separate from the computer itself, where you can store software and data on a semipermanent basis. Secondary storage is necessary because memory, or primary storage, can be used only temporarily. However, you probably want to reuse information you have derived from processing; that is why secondary storage is needed.
The benefits of secondary storage can be summarized as follows:
| || ||Space. Organizations may store the equivalent of a roomful of data on sets of disks that take up less space than a breadbox. A simple diskette for a personal computer can hold the equivalent of 500 printed pages, or one book. An optical disk can hold the equivalent of approximately 500 books.|
| || ||Reliability. Data in secondary storage is basically safe, since secondary storage is physically reliable. (We should note, however, that disks sometimes fail.) Also, it is more difficult for untrained people to tamper with data on disk than with data stored on paper in a file cabinet.|
| || ||Convenience. With the help of a computer, authorized users can locate and access data quickly.|
| || ||Economy. Together the three previous benefits indicate significant savings in storage costs. It is less expensive to store data on tape or disk (the principal means of secondary storage) than to buy and house filing cabinets. Data that is reliable and safe is less expensive to maintain than data subject to errors. But the greatest savings can be found in the speed and convenience of filing and retrieving data.|
These benefits apply to all the various secondary storage devices, but, as you will see, some devices are better than others. The discussion begins with a look at the various storage media, including those used for personal computers, and then moves to what it takes to get data organized and processed.
Diskettes and hard disks are magnetic media; that is, they are based on a technology of representing data as magnetized spots on the disk--with a magnetized spot representing a 1 bit and the absence of such a spot representing a 0 bit. Reading data from the disk means converting the magnetized data to electrical impulses that can be sent to the processor. Writing data to disk is the opposite; it involves sending electrical impulses from the processor to be converted to magnetized spots on the disk. As Figure 1 shows, the surface of each disk has concentric tracks on it. The number of tracks per surface varies with the particular type of disk.
A diskette is made of flexible mylar and coated with iron oxide, a substance that can be magnetized. A diskette can record data as magnetized spots on tracks on its surface. Diskettes became popular along with the personal computer. Most computers use the 3 1/2-inch diskette, whose capacity is 1.44 megabytes of data (Figure 2). The diskette has the protection of a hard plastic jacket and fits conveniently in a shirt pocket or purse. The key advantage of diskettes is portability. Diskettes easily transport data from one computer to another. Workers, for example, carry their files from office computer to home computer and back on a diskette instead of carrying a stack of papers in a briefcase. Students use the campus computers but keep their files on their own diskettes. Diskettes are also a convenient vehicle for backup: It is convenient to place an extra copy of a hard disk file on a diskette.
However, the venerable 3 1/2-inch diskette, a standard for a decade, is being challenged. A new standard could be a higher-capacity disk whose drive can handle both the new disk type and the traditional 3 1/2-inch disk. However, the technology with a head start is Iomega's Zip drive, already installed by 20 million users. The Zip drive holds 100-megabyte disks, 70 times the capacity of traditional diskettes (Figure 3). The disadvantage of the Zip drive is that it is not compatible with 3 1/2-inch diskettes.
|Figure 2 Diskette. A cutaway view of a 3 1/2-inch diskette.|
Even a high-capacity diskette can be problematic if, for example, you want to take a large file back and forth between your office and home computers. One possibility is data compression, the process of squeezing a big file into a small place. Compression can be as simple as removing all extra space characters, inserting a single repeat character to indicate a string of repeated characters, and substituting smaller data strings for frequently occurring characters. This kind of compression can reduce a text file to 50 percent of its original size. Compression is performed by a program that uses a formula to determine how to compress or decompress data. To be used again, the file must, of course, be uncompressed. Incidentally, to speed up the transfer, many users choose to compress files that will be sent from one computer to another via data communications.
|Figure 3 The Iomega Zip disk drive. Shown here is a separate drive unit, but many users have their Zip drive installed in a bay in the computer's housing.|
A hard disk is a metal platter coated with magnetic oxide that can be magnetized to represent data. Hard disks come in a variety of sizes (Figure 4a). Several disks can be assembled into a disk pack. There are different types of disk packs, with the number of platters varying by model. Each disk in the pack has top and bottom surfaces on which to record data. Many disk devices, however, do not record data on the top of the top platter or on the bottom of the bottom platter.
A disk drive is a device that allows data to be read from a disk or written on a disk. A disk pack is mounted on a disk drive that is a separate unit connected to the computer. Large computers have dozens or even hundreds of external disk drives; in contrast, the hard disk for a personal computer is within the computer housing. In a disk pack all disks rotate at the same time, although only one disk is being read from or written on at any one time. The mechanism for reading or writing data on a disk is an access arm; it moves a read/write head into position over a particular track (Figure 5a). The read/write head on the end of the access arm hovers just above the track but does not actually touch the surface. When a read/write head does accidentally touch the disk surface, it is called a head crash and data can be destroyed. Data can also be destroyed if a read/write head encounters even minuscule foreign matter on the disk surface (Figure 5b). A disk pack has a series of access arms that slip in between the disks in the pack (Figure 5c). Two read/write heads are on each arm, one facing up to access the surface above it and one facing down to access the surface below it. However, only one read/write head can operate at any one time.
|Figure 4 Magnetic disks. (a) Hard magnetic disks come in a variety of sizes. Shown here is a 3 1/2-inch hard drive for a personal computer. (b) These 3 1/2-inch diskettes are protected by a firm plastic exterior cover.|
Most disk packs combine the disks, access arms, and read/write heads in an airtight, sealed module. These disk assemblies are put together in clean rooms so that even microscopic dust particles do not get on the disk surface.
Hard disks for personal computers are 3 1/2-inch disks in sealed modules (Figure 6). Hard disk capacity for personal computers has soared in recent years; older hard disks have capacities of tens of megabytes, but new ones offer multiple gigabytes of storage. Terabyte capacity is on the horizon. Although an individual probably cannot imagine generating enough output--letters, budgets, reports, pictures, and so forth--to fill a hard disk, software packages take up a lot of space and can make a dent rather quickly. Furthermore, graphics images and audio and video files require large amounts of disk space. Perhaps more important than capacity, however, is the convenience of speed. Personal computer users find that accessing files on a hard disk is significantly faster and more convenient than accessing files on a diskette.
No storage system is completely safe, but a redundant array of independent disks, or simply RAID, comes close. RAID storage uses several small hard disks that work together as a unit. The most basic RAID system--RAID level 1--simply duplicates data on separate disk drives, a concept called disk mirroring (Figure 7b). Thus no data is lost if one drive fails. This process is reliable but expensive. Expense, however, may not be an issue when the value of the data is considered.
||Hard Disks in Groups|
|Figure 5 Read/write heads and access arms. (a) This photo shows a read/write head on the end of an access arm poised over a hard disk. (b) When in operation, the read/write head comes very close to the surface of the disk. On a disk, particles as small as smoke, dust, a fingerprint, and a hair loom large. If the read/write head encounters one of these, data is destroyed and the disk damaged. (c) Note that there are two read/write heads on each access arm. Each arm slips between two disks in the disk pack. The access arms move simultaneously, but only one read/write head operates at any one time.|
|Figure 6 Hard disk for a personal computer. The innards of a 3 1/2-inch hard disk with the access arm visible.|
Higher levels of RAID take a different approach called data striping (Figure 7c), which involves spreading the data across several disks in the array, with one disk used solely as a check disk, to keep track of what data is where. If a disk fails, the check disk can reconstitute the data. Higher levels of RAID process data more quickly than simple data mirroring does. RAID is now the dominant form of storage for mainframe computer systems.
There is more than one way of physically organizing data on a disk. The methods considered here are the sector method and the cylinder method.
||How Data Is Organized on a Disk|
The Sector Method In the sector method each track on a disk is divided into sectors that hold a specific number of characters (Figure 8a). Data on the track is accessed by referring to the surface number, track number, and sector number where the data is stored. The sector method is used for diskettes.
The fact that a disk is circular presents a problem: The distance around the tracks on the outside of the disk is greater than that around the tracks on the inside. A given amount of data that takes up one inch of a track on the inside of a disk might be spread over several inches on a track near the outside of a disk. This means that the tracks on the outside are not storing data as efficiently.
Zone recording takes maximum advantage of the storage available by dividing a disk into zones and assigning more sectors to tracks in outer zones than to those in inner zones (Figure 8b). Since each sector on the disk holds the same amount of data, more sectors mean more data storage than if all tracks had the same number of sectors.
The Cylinder Method A way to organize data on a disk pack is the cylinder method, shown in Figure 9. The organization in this case is vertical. The purpose is to reduce the time it takes to move the access arms of a disk pack into position. Once the access arms are in position, they are in the same vertical position on all disk surfaces.
|Figure 7 RAID storage. (a) Data is stored on disk in traditional fashion. (b) Disk mirroring with RAID stores a duplicate copy of the data on a second disk. (c) In a system called data striping with RAID, data is scattered among several disks, with a check disk that keeps track of what data is where so that data lost on a bad disk can be re-created.|
|Figure 8 Sectors and zone recording. (a) When data is organized by sector, the address is the surface, track, and sector where the data is stored. (b) If a disk is divided into traditional sectors, as shown here on the left, each track has the same number of sectors. Sectors near the outside of the disk are wider, but they hold the same amount of data as sectors near the inside. If the disk is divided into recording zones, as shown on the right, the tracks near the outside have more sectors than the tracks near the inside. Each sector holds the same amount of data, but since the outer zones have more sectors, the disk as a whole holds more data than the disk on the left.|
To appreciate this, suppose you had an empty disk pack on which you wished to record data. You might be tempted to record the data horizontally--to start with the first surface and fill track 000, track 001, track 002, and so on and then move to the second surface and again fill tracks 000, 001, 002, and so forth. Each new track and new surface, however, would require movement of the access arms, a relatively slow mechanical process.
Recording the data vertically, on the other hand, substantially reduces access arm movement. The data is recorded on the tracks that can be accessed by one positioning of the access arms--that is, on one cylinder. To visualize cylinder organization, pretend that a cylindrically shaped item, such as a tin can, is dropped straight down through all the disks in the disk pack. All the tracks thus encountered, in the same position on each disk surface, make up a cylinder. The cylinder method, then, means that all tracks of a certain cylinder on a disk pack are lined up one beneath the other, and all the vertical tracks of one cylinder are accessible by the read/write heads with one positioning of the access arm mechanism. Tracks within a cylinder are numbered according to this vertical perspective, from 0 on the top down to the last surface on the bottom.
The explosive growth in storage needs has driven the computer industry to provide inexpensive and compact storage with greater capacity. This demanding shopping list is a description of the optical disk (Figure 10a). The technology works like this: A laser hits a layer of metallic material spread over the surface of a disk. When data is being entered, heat from the laser produces tiny spots on the disk surface. To read the data, the laser scans the disk, and a lens picks up different light reflections from the various spots.
|Figure 9 Cylinder data organization. To visualize the cylinder form of organization, imagine dropping a cylinder such as a tin can straight down through all the disks in the disk pack. Within cylinder 150, the track surfaces are vertically aligned and are numbered vertically from top to bottom.|
Optical storage technology is categorized according to its read/write capability. Read-only media are disks recorded by the manufacturer and can be read from but not written to by the user. Such a disk cannot, obviously, be used for your files, but manufacturers can use it to supply software. An applications software package could include a dozen diskettes or more; all these can fit on one optical disk with room to spare. Furthermore, software can be more easily installed from a single optical disk than from a pile of diskettes.
Write-once, read-many media, also called WORM media, may be written to once. Once filled, a WORM disk becomes a read-only medium. A WORM disk is nonerasable. For applications demanding secure storage of original versions of valuable documents or data, such as legal records, the primary advantage of nonerasability is clear: Once they are recorded, no one can erase or modify them.
A hybrid type of disk, called magneto-optical (MO), combines the best features of magnetic and optical disk technologies. A magneto-optical disk has the high-volume capacity of an optical disk but can be written over like a magnetic disk. The disk surface is coated with plastic and embedded with magnetically sensitive metallic crystals. To write data, a laser beam melts a tiny spot on the plastic surface and a magnet aligns the crystals before the plastic cools. The crystals are aligned so that some reflect light and others do not. When the data is later read by a laser beam, only the crystals that reflect light are picked up.
Figure 10 Optical disks. (a) Optical disks store data using laser beam technology. (b) Many laptop computers include a CD-ROM drive. Laptop users can use CD-ROM applications to make on-the-road presentations or can pop in a CD-ROM encyclopedia to find some needed information.
A variation on optical storage technology is the CD-ROM, for compact disk read-only memory. CD-ROM has a major advantage over other optical disk designs: The disk format is identical to that of audio compact disks, so the same dust-free manufacturing plants that are now stamping out digital versions of Kenny G or Jewel can easily convert to producing anything from software to a digitized encyclopedia. Furthermore, CD-ROM storage is substantial--up to 660 megabytes per disk, the equivalent of more than 400 standard 3 1/2-inch diskettes.
Keep in mind that a CD-ROM cannot be used in your personal computer's diskette drive; you must have a CD-ROM drive on your computer (or, as we will discuss shortly, a DVD drive). Today, even laptop computers have CD-ROM drives (Figure 10b). Although CD-ROMs are read-only, a different technology called CD-R permits writing on optical disks--but just once; mistakes cannot be undone. CD-R technology requires a CD-R drive, CD-R disks, and the accompanying software. Once a CD-R disk is written on, it can be read not only by the CD-R drive but by any CD-ROM drive. Another variation, CD-RW, is more flexible, permitting reading, writing, and rewriting.
The new storage technology that outpaces all others is called DVD-ROM, for digital versatile disk (originally digital video disk). Think of a DVD, as it is called for short, as an overachieving CD-ROM. Although the two look the same, a DVD has an astonishing 4.7-gigabyte capacity, seven times more than that of the CD-ROM. And that is just the plain variety. DVDs have two layers of information, one clear and one opaque, on a single side; this so-called double-layered DVD surface can hold about 8.5GB. Furthermore, DVDs can be written on both sides, bumping capacity to 17GB (Figure 11). And a DVD-ROM drive can also read CD-ROMs. It is not surprising that DVD-ROM technology is seen as a replacement for CD-ROMs over the next few years.
|Figure 11 DVD-ROM. A DVD-ROM can use one or two sides, with each side having one or two layers. Since a single layer holds 4.7 gigabytes, and the second almost as much, a DVD-ROM with two sides and two layers per side can hold almost four times that much, or 17 gigabytes.|
Operating very much like CD-ROM technology, DVD uses a laser beam to read microscopic spots that represent data. But DVD uses a laser with a shorter wavelength, permitting it to read more densely packed spots, thus increasing the disk capacity. The benefits of this storage capacity are many--full-length movies and exquisite sound. Audio quality on DVD is comparable to that of current audio compact disks. DVDs will eventually hold high-volume business data. It is just a matter of time until all new personal computers will come with a DVD drive as standard equipment. The writable version of DVD is DVD-RAM, whose standards are being hammered out.
If you have a CD-ROM or a DVD-ROM drive, you are on your way to one of the computer industry's great adventures: multimedia.
Multimedia stirs the imagination. For example, have you ever thought that you could see a film clip from Gone with the Wind on your computer screen? One could argue that such treats are already available on videocassette, but the computer version provides an added dimension for this and other movies: reviews by critics, photographs of movie stars, lists of Academy Awards, the possibility of user input, and much more. Software described as multimedia typically presents information with text, illustrations, photos, narration, music, animation, and film clips (Figure 12). Until the optical disk, placing this much data on a disk was impractical. However, the large capacity of optical disks means that the kinds of data that take up huge amounts of storage space--photographs, music, and film clips--can now be readily accommodated.
To use multimedia software, you must have the proper hardware. In addition to the aforementioned CD-ROM or DVD-ROM drive, you also need a sound card or sound chip (installed internally) and speakers, which may rest externally on either side of the computer or be built into the computer housing. Special software accompanies the drive and sound card. In particular, if full-motion video is important to you, be sure your computer includes MPEG (Motion Picture Experts Group), a set of widely accepted video standards. Another video-related issue is the speed of the drive: the faster the better. The higher the drive speed, the faster the transfer of data and the smoother the video showing on the screen.
|Figure 12 Multimedia applications. Multimedia applications offer everything from games to business advice. These four samples include (a) a look at Dangerous Creatures, complete with movie clips, fierce animal sounds, and not-so-fierce baby animals; (b) selections of plants and scenes to help plan landscaping; (c) everything you need to learn Russian, including the clicked/spoken alphabet and phrases; and (d) the popular interactive game Riven, which uses movement and music to enhance the adventure.|
Should your next computer be a multimedia personal computer? Absolutely. There is no doubt that multimedia is the medium of choice for all kinds of software.
If you take a moment to peruse the racks of multimedia software in your local store, you can see that most of the current offerings come under the categories of entertainment or education--or possibly both. You can study and hear works by Stravinsky or Schubert. You can explore the planets or the ocean bottom through film clips and narrations by experts. You can be "elected" to Congress, after which you tour the Capitol, decorate your office, hire staff, and vote on issues. You can study the battle of Gettysburg--and even change the outcome. You can study the Japanese language, seeing the symbols and hearing the intonation. You can buy multimedia versions of reference books, magazines, children's books, and entire novels.
But this is just the beginning. Businesses are already moving to this high-capacity environment for street atlases, national phone directories, and sales catalogs. Coming offerings will include every kind of standard business application, all tricked out with fancy animation, photos, and sound. Educators will be able to draw upon the new sight and sound for everything from human anatomy to time travel. And just imagine the library of the future, consisting not only of the printed word but also of photos, film, animation, and sound recordings--all flowing from the computer.
We saved magnetic tape storage for last because it now has taken a subordinate role in storage technology. Magnetic tape looks like the tape used in music cassettes--plastic tape with a magnetic coating. As in other magnetic media, data is stored as extremely small magnetic spots. Tapes come in a number of forms, including 1/2-inch-wide tape wound on a reel, 1/4-inch-wide tape in data cartridges and cassettes, and tapes that look like ordinary music cassettes but are designed to store data instead of music. The amount of data on a tape is expressed in terms of density, which is the number of characters per inch (cpi) or bytes per inch (bpi) that can be stored on the tape.
|Figure 13 Magnetic tape units. Tapes are always protected by glass from outside dust and dirt. These modern tape drives, called "stackers," accept several cassette tapes, each with its own supply and take-up reel.|
The highest-capacity tape is the digital audio tape, or DAT, which uses a different method of recording data. Using a method called helical scan recording, DAT wraps around a rotating read/write head that spins vertically as it moves. This places the data in diagonal bands that run across the tape rather than down its length. This method produces high density and faster access to data.
Figure 13 shows a magnetic tape unit that might be used with a mainframe. The tape unit reads and writes data using a read/write head. When the computer is writing on the tape, the erase head first erases any data previously recorded.
Two reels are used, a supply reel and a take-up reel. The supply reel, which has the tape with data on it or on which data will be recorded, is the reel that is changed. The take-up reel always stays with the magnetic tape unit. Many cartridges and cassettes have the supply and take-up reels built into the same case.
Tape now has a limited role because disks have proved to be the superior storage medium. Disk data is quite reliable, especially within a sealed module. Furthermore, as will be shown, disk data can be accessed directly, as opposed to sequential data on tape, which can be accessed only by passing by all the data ahead of it on the tape. Consequently, the primary role of tape today is as an inexpensive backup medium.
Although a hard disk is an extremely reliable device, it is subject to electromechanical failures that cause loss of data, as well as physical damage from fire and natural disasters. Furthermore, data files, particularly those accessed by several users, are subject to errors introduced by users. There is also the possibility of errors introduced by software. With any method of data storage, a backup system--a way of storing data in more than one place to protect it from damage and errors--is vital. As already noted, magnetic tape is used primarily for backup purposes. For personal computer users, an easy and inexpensive way to back up a hard disk file is simply to copy it to a diskette or Zip disk whenever it is updated. But this is not practical for a system with many files or many users.
Personal computer users have the option of purchasing their own tape backup system, to be used on a regular basis for copying all data from hard disk to a high-capacity tape. Data thus saved can be restored to the hard disk later if needed. A key advantage of a tape backup system is that it can copy the entire hard disk in minutes; also, with the availability of gigabytes of hard disk space, it is not really feasible to swap diskettes in and out of the machine. Further, tape backup can be scheduled to take place when you are not going to be using the computer.
As users of computer systems, we offer data as we are instructed to do, such as punching in our identification code at an automated teller machine or perhaps filling out a form with our name and address. But data cannot be dumped helter-skelter into a computer. Some computer professional--probably a programmer or systems analyst--has to have planned how data from users will be received, organized, and stored and also in what manner data will be processed by the computer.
||Organizing and Accessing Stored Data|
This kind of storage goes beyond what you may have done to store a memo created in word processing. Organizations that store data usually need a lot of data on many subjects. For example, a charitable organization would probably need detailed information about donors, names and schedules of volunteers, perhaps a schedule of fund-raising events. A factory would need to keep track of inventory (name, identification number, location, quantity, and so forth), the scheduled path of the product through the assembly line, records of quality-control checkpoints, and much more. All this data must be organized and stored according to a plan. First consider how data is organized.
To be processed by the computer, raw data is organized into characters, fields, records, files, and databases. First is the smallest element, the character.
||Data: Getting Organized|
| ||A character is a letter, digit, or special character (such as $, ?, or *).|
| ||A field contains a set of related characters. For example, suppose that a health club is making address labels for a mailing. For each person it might have a member number field, a name field, a street address field, a city field, a state field, a zip code field, and a phone number field.|
| ||A record is a collection of related fields. Thus, on the health club mailing list, one person's member number, name, address, city, state, zip code, and phone number constitute a record.|
| ||A file is a collection of related records. All the member records for the health club compose a file. Figure 14 shows how data for a health club member might look.|
| ||A database is a collection of interrelated files stored together with minimum redundancy. Specific data items can be retrieved for various applications. For instance, if the health club is opening a new outlet, it can pull out the names of people with zip codes near the new club and send them an announcement.|
A field of particular interest is the key, a unique identifier for a record. It might seem at first that a name--of a person, say, or a product--would be a good key; however, since some names may be the same, a name field is not a good choice for a key. When a file is first computerized, existing description fields are seldom used as keys. Although a file describing people might use a Social Security number as the key, it is more likely that a new field will be developed that can be assigned unique values, such as customer number or product number.
In addition to organizing the expected data, a plan must be made to access the data on files.
Now that you have a general idea of how data is organized, you are ready to look at the process used to decide how to place data on a storage medium. Consider this chain: (1) It is the application--payroll, airline reservations, inventory control, whatever--that determines how the data must be accessed by users. (2) Once an access method has been determined, it follows that there are certain ways the data must be organized so that the needed access is workable. (3) The organization method, in turn, limits the choice of storage medium. The discussion begins with an appreciation of application demands, then moves to a detailed look at organization and access.
||The File Plan: An Overview|
The following application examples illustrate how an access decision might be made.
As you can see, the question of access seems to come down to whether a particular record is needed right away, as it was in examples 1 and 3. This immediate need for a particular record means access must be direct. It follows that the organization must also be direct, or at least indexed, and that the storage medium must be disk. Furthermore, the type of processing, a related topic, must be transaction processing. The critical distinction is whether or not immediate access to an individual record is needed. The following discussion examines all these topics in detail. Although organization type is determined by the type of access required, the file must be organized before it can be accessed, so organization is the first topic.
- A department store offers its customers charge accounts. When a customer makes a purchase, a sales clerk needs to be able to check the validity of the customer's account while the customer is waiting. The clerk needs immediate access to the individual customer record in the account file.
- A major oil company supplies its charge customers with credit cards, which it considers sufficient proof for purchase. The charge slips collected by gas stations are forwarded to the oil company, which processes them in order of account number. Unlike the retail example just given, the company does not need access to any one record at a specific time but merely needs access to all customer charge records when it is time to prepare bills.
|Figure 14 How data is organized. Whether stored on tape or on disk, data is organized into characters, fields, records, and files. A file is a collection of related records. These drawings represent (a) magnetic tape and (b) magnetic disk.|
- A city power and light company employee accepts reports of burned-out streetlights from residents over the phone. Using a key made up of unique address components, the clerk immediately finds the record for the offending streetlight and prints out a one-page report that is routed to repair units within 24 hours. To produce such quick service for an individual streetlight, the employee needs to be able to access the individual streetlight record.
- Next-month schedules for airline flight attendants are computer-produced monthly and delivered to the attendants' home-base mailboxes. The schedules are put together from information based on flight records, and the entire file can be accessed monthly at the convenience of the airline and the computer-use plan.
There are three major methods of storing files of data in secondary storage:
||File Organization: Three Methods|
| ||Sequential file organization, in which records are organized in a particular order|
| ||Direct file organization, in which records are not organized in any special order|
| ||Indexed file organization, in which records are organized sequentially but indexes are built into the file to allow a record to be accessed either sequentially or directly|
Sequential File Organization Sequential file processing means that records are in order according to a key field. As noted earlier, a file containing information on people will be in order by a key that uniquely identifies each person, such as Social Security number or customer number. If a particular record in a sequential file is wanted, all the prior records in the file must be read before the desired record is reached. Tape storage is limited to sequential file organization. Disk storage may be sequential, but records on disk can also be accessed directly.
Direct File Organization Direct file processing, or direct access, allows the computer to go directly to the desired record by using a record key; the computer does not have to read all preceding records in the file as it does if the records are arranged sequentially. Direct processing requires disk storage; in fact, a disk device is called a direct-access storage device (DASD) because the computer can go directly to the desired record on the disk. It is this ability to access any given record instantly that has made computer systems so convenient for people in service industries--for catalog order-takers determining whether a particular sweater is in stock, for example, or bank tellers checking individual bank balances. An added benefit of direct-access organization is the ability to read, change, and return a record to its same place on the disk; this is called updating in place.
Obviously, if we have a completely blank area on the disk and can put records anywhere, there must be some predictable system for placing a record at a disk address and then retrieving the record at a subsequent time. In other words, once the record has been placed on a disk, it must be possible to find it again. This is done by choosing a certain formula to apply to the record key, thereby deriving a number to use as the disk address. Hashing, or randomizing, is the name given to the process of applying a mathematical operation to a key to yield a number that represents the address. Even though the record keys are unique, it is possible for a hashing scheme to produce the same disk address, called a synonym, for two different records; such an occurrence is called a collision. There are various ways to recover from a collision; one way is simply to use the next available record slot on the disk.
|Figure 15 A hashing scheme. Dividing the key number 1269 by the prime number 17 yields a remainder of 11, which can be used to indicate the address on a disk.|
There are many different hashing schemes; although the example in Figure 15 is too simple to be realistic, it can give you a general idea of how the process works. An example of how direct processing works is provided in Figure 16.
Indexed File Organization Indexed file processing, or indexed processing, is a third method of file organization, and it represents a compromise between the sequential and direct methods. It is useful in applications where a file needs to be in sequential order, but, in addition, access to individual records is needed.
|Figure 16 An example of direct access. Assume there are 13 addresses (0 through 12) available in the file. Dividing the key number 661, which is C. Kear's employee number, by the prime number 13 yields a remainder of 11. Thus, 11 is the address for key 661. However, for the key 618, dividing by 13 yields a remainder of 7, a synonym, since this address has already been used by the key 137, which also has a remainder of 7. Hence the address becomes the next location--that is, 8. Note, incidentally, that keys (and therefore records) need not appear in any particular order. (The 13 record locations available are, of course, too few to hold a normal file; a small number was used to keep the example simple.)
An indexed file works as follows: Records are stored in the file in sequential order, but the file also contains an index. The index contains entries consisting of the key to each record stored on the file and the corresponding disk address for that record. The index is like a directory, with the keys to all records listed in order. For a record to be accessed directly, the record key must be located in the index; the address associated with the key is then used to locate the record on the disk. Accessing the entire file of records sequentially is simply a matter of beginning with the first record and proceeding one at a time through the rest of the records.
Before proceeding with the actual processing of data, consider the physical activity of the disk as it accesses records directly.
Three primary factors determine access time, the time needed to access data directly on disk:
||Disk Access to Data|
| ||Seek time. This is the time it takes the access arm to get into position over a particular track. Keep in mind that all the access arms move as a unit, so they are simultaneously in position over a set of tracks that make up a cylinder.|
| ||Head switching. The access arms on the access mechanism do not move separately; they move together, all at the same time. However, only one read/write head can operate at any one time. Head switching is the activation of a particular read/write head over a particular track on a particular surface. Since head switching takes place at the speed of electricity, the time it takes is negligible.|
| ||Rotational delay. Once the access arm and read/write head are in position and ready to read or write data, the read/write head waits for a short period until the desired data on the track moves under it.|
Once the data has been found, the next step is data transfer, the process of transferring data between memory and the place on the disk track--from memory to the track if the computer is writing, from the track to memory if the computer is reading. One measure for the performance of disk drives is the average access time, which is usually measured in milliseconds (ms). Another measure is the data transfer rate, which tells how fast data can be transferred once it has been found. This usually will be stated in terms of megabytes of data per second.
Once there is a plan for accessing the files, they can be processed. There are several methods of processing data files in a computer system. The two main methods are batch processing (processing data in groups at a more convenient later time) and transaction processing (processing data immediately, as it is received).
Batch processing is a technique in which transactions are collected into groups, or batches, to be processed at a time when the computer may have few online users and thus be more accessible, usually during the night. Unlike transaction processing, a topic coming up momentarily, batch processing involves no direct user interaction. Let us consider updating the health club address-label file. The master file, a semipermanent set of records, is, in this case, the list of all members of the health club and their addresses. The transaction file contains all changes to be made to the master file: additions (transactions to create new master records for new members), deletions (transactions with instructions to delete master records of members who have resigned from the health club), and revisions (transactions to change items such as street addresses or phone numbers in fields in the master records). Periodically, perhaps monthly or weekly, the master file is updated with the changes called for in the transaction file. The result is a new, up-to-date master file (Figure 17).
In batch processing, before a transaction file is matched against a master file, the transaction file must be sorted (usually by computer) so that all the transactions are in sequential order according to a key field. In updating the health club address-label file, the key is the member number assigned by the health club. The records on the master file are already in order by key. Once the changes in the transaction file are sorted by key, the two files can be matched and the master file updated.
During processing, the computer matches the keys from the master and transaction files, carrying out the appropriate action to add, revise, or delete. At the end of processing, a newly updated master file is created; in addition, an error report is usually printed. The error report shows actions such as an attempt to delete a nonexistent record or an attempt to add a record that already exists.
|Figure 17 How batch processing works. The purpose of this system is to update the health club's master address-label file. The updating will be done sequentially. (1) Changes to be made (additions, deletions, and revisions) are input with (2) a keyboard, sorted, and sent to a disk, where they are stored in (3) the transaction file. The transaction file contains records in sequential order, according to member number, from lowest to highest. The field used to identify the record is called the key; in this instance the key is the member number. (4) The master file is also organized by member number. (5) The computer matches transaction file data and master file data by member number to produce (6) a new master file and (7) an error report and a new member report. Note that since this was a sequential update, the new master file is a completely new file, not just the old file updated in place. The error report lists member numbers in the transaction file that were not in the master file and member numbers that were included in the transaction file as additions that were already in the master file.|
Transaction processing is a technique of processing transactions--a bank withdrawal, an address change, a credit charge--in random order, that is, in any order they occur. Note that although batch processing also uses transactions, in that case they are grouped together for processing; the phrase transaction processing means that each transaction is handled immediately. Transaction processing is real-time processing. Real-time processing means that a transaction is processed fast enough for the result to come back and be acted upon right away. For example, a teller at a bank can find out immediately what your bank balance is. For processing to be real-time, it must also be online--that is, the terminals must be connected directly to the computer. Transaction processing systems use disk storage because the disk drive can move directly to the desired record.
Advantages of transaction processing are immediate access to stored data (and thus immediate customer service) and immediate updating of the stored data. A sales clerk, for example, could access the computer via a terminal to verify the customer's credit and also record the sale via the computer (Figure 18). Later, by the way, those updated records can be batch-processed to bill all customers.
|Figure 18 How transaction processing works. The purposes of this retail sales system are to verify that a customer's credit is good, record the credit sale on the customer's record, and produce a sales receipt. Since customers may have the same name, the file is organized by customer account number rather than by name. Here Maria Rippee, account number 50130, wishes to purchase a coat for $179. (1) The sales clerk uses the terminal to input Maria's account number and the sale. (2) When the computer receives the data from the clerk, it uses the account number to find Maria's record on the disk file, verify her credit, and record the sale so that she will later be billed for it. (3) The computer returns an acceptance to the clerk's terminal. (4) The computer sends sales receipt information to the clerk's printer. All this is done within seconds while the customer is waiting. This example is necessarily simplified, but it shows a system that is real-time (immediate response) and online (directly connected to the computer).|
Numerous computer systems combine the best features of both methods of processing. Generally speaking, transaction processing is used for activities related to the current needs of people--especially workers and customers--as they go about their daily lives. Batch processing, by comparison, can be done at any time, even in the middle of the night, without worrying about the convenience of the people ultimately affected by the processing.
||Batch and Transaction Processing: The Best of Both Worlds|
A bank, for instance, may use transaction processing to check your balance and individually record your cash withdrawal transaction during the day at the teller window. However, the deposit that you leave in an envelope in an "instant" deposit drop may be recorded during the night by means of batch processing. Printing your bank statement is also a batch process. Most store systems also combine both methods: A point-of-sale terminal finds the individual item price as a sale is made, but that same process captures inventory data, which may be batched and totaled to produce inventory reports.
Police license-plate checks for stolen cars work the same way. As cars are sold throughout the state, the license numbers, owners' names, and so on, are updated in the motor vehicle department's master file, usually via batch processing on a nightly basis. But when police officers see a car they suspect may be stolen, they can radio headquarters, where an operator with a terminal uses transaction processing to check the master file immediately to see if the car has been reported missing. Some officers have a laptop computer right in the car and can check the information themselves.
Auto junkyards, which often are computerized big businesses, can make an individual inquiry for a record of a specific part needed by a customer waiting on the phone or in person. As parts are sold, sales records are kept to update the files nightly using batch processing.
As you can see from these examples, both workers and customers eventually see the results of transaction processing in the reports output by batch processing. Managers will see further batch processing output in the form of information gathered and summarized about the processed transactions. And, finally, new transaction processing is possible based on the results of previous batch processing.
What is the future of storage? Perhaps holographic storage, which would be able to store thousands of pages on a device the size of a quarter and would be much faster than even the fastest hard drives. Whatever the technology, it seems likely that there will be greater storage capabilities in the future to hold the huge data files for law, medicine, science, education, business, and, of course, the government.
To have access to all that data from any location, we need data communications, the subject of the next chapter.
||Summary and Key Terms|
| || ||Secondary storage, sometimes called auxiliary storage, is storage separate from the computer itself, where software and data can be stored on a semipermanent basis. Secondary storage is necessary because memory, or primary storage, can be used only temporarily.|
| || ||The benefits of secondary storage are space, reliability, convenience, and economy.|
| || ||Diskettes and hard disks are magnetic media, based on a technology of representing data as magnetized spots on the disk. The surface of each disk has concentric tracks on it.|
| || ||Diskettes are made of flexible mylar. Advantages of diskettes, as compared with hard disks, are portability and backup. The 3 1/2-inch diskette standard may be challenged by a new, higher-capacity disk whose drive can handle both the new disk and the traditional 3 1/2-inch disk, or perhaps by Iomega's Zip drive, whose disk has a high capacity but is not compatible with 3 1/2-inch diskettes.|
| || ||Data compression makes a large file smaller by temporarily removing nonessential items.|
| || ||A hard disk is a metal platter coated with magnetic oxide that can be magnetized to represent data. Several disks can be assembled into a disk pack.|
| || ||A disk drive is a machine that allows data to be read from a disk or written on a disk. A disk pack is mounted on a disk drive that is a separate unit connected to the computer. The disk access arm moves a read/write head into position over a particular track, where the read/write head hovers above the track. A head crash occurs when a read/write head touches the disk surface and causes all data to be destroyed.|
| || ||A redundant array of independent disks, or simply RAID, uses several small hard disks that work together as a unit. RAID level 1 duplicates data on separate disk drives, disk mirroring. Higher levels of RAID use data striping, spreading the data across several disks in the array, with one disk used solely as a check disk to keep track of what data is where.|
| || ||The sector method of recording data on a disk divides each track into sectors that hold a specific number of characters. Data on the track is accessed by referring to the surface number, track number, and sector number where the data is stored. Zone recording involves dividing a disk into zones to take maximum advantage of the storage available by assigning more sectors to tracks in outer zones than to those in inner zones.|
| || ||The cylinder method is a vertical organization of data on a disk pack. The set of tracks that can be accessed by one positioning of the access arms is called a cylinder.|
| || ||Optical disk technology uses a laser beam to enter data as spots on the disk surface. To read the data, the laser scans the disk, and a lens picks up different light reflections from the various spots. Read-only media are recorded on by the manufacturer and can be read from but not written to by the user. Write-once, read-many media, also called WORM media, may be written to once. A hybrid type of disk, called magneto-optical (MO), has the large capacity of an optical disk but can be written over like a magnetic disk. CD-ROM, for compact disk read-only memory, which has a disk format identical to that of audio compact disks, can hold up to 660 megabytes per disk. CD-R technology permits writing on optical disks. CD-RW technology is more flexible, permitting reading, writing, and rewriting.|
| || ||DVD-ROM, for digital versatile disk, has astonishing storage capacity, up to 17GB if both layers and both sides are used.|
| || ||Multimedia software typically presents information with text, illustrations, photos, narration, music, animation, and film clips--possible because of the large capacity of optical disks. MPEG (Motion Picture Experts Group) is a set of widely accepted video standards.|
| || ||Magnetic tape stores data as extremely small magnetic spots. The amount of data on a tape is expressed in terms of density, which is the number of characters per inch (cpi) or bytes per inch (bpi) that can be stored on the tape. The highest-capacity tape is digital audio tape, or DAT, which uses a different method of recording data. Through a method called helical scan recording, the data is placed in diagonal bands that run across the tape rather than down its length.|
| || ||A magnetic tape unit reads and writes data using a read/write head; when the computer is writing on the tape, the erase head first erases any data previously recorded. Two reels are used: a supply reel that has the data tape and a take-up reel that stays with the magnetic tape unit.|
| || ||A backup system is a way of storing data in more than one place to protect it from damage and loss. Most backup systems use tape.|
| || ||A character is a letter, digit, or special character (such as $, ?, or *). A field contains a set of related characters. A record is a collection of related fields. A file is a collection of related records. A database is a collection of interrelated files stored together with minimum redundancy; specific data items can be retrieved for various applications.|
| || ||Sequential file processing means that records are in a certain order according to a unique identifier field called a key. If a particular record in a sequential file is wanted, then all the prior records in the file must be read before the desired record is reached.|
| || ||Direct file processing, or direct access, allows the computer to go directly to the desired record by using a record key. Direct processing requires disk storage; a disk device is called a direct-access storage device (DASD). In addition to instant access to any record, an added benefit of direct-access organization is the ability to read, change, and return a record to its same place on the disk; this is called updating in place. Hashing, or randomizing, is the name given to the process of applying a formula to a key to yield a number that represents the address for the record that has that key. A hashing scheme may produce the same disk address, called a synonym, for two different records; such an occurrence is called a collision.|
| || ||Indexed file processing, or indexed processing, stores records in the file in sequential order, but the file also contains an index of keys; the address associated with the key is then used to locate the record on the disk.|
| || ||Three factors determine access time, the time needed to access data directly on disk: seek time, the time it takes to get the access arm into position over a particular track; head switching, the activation of a particular read/write head over a particular track on a particular surface; and rotational delay, the brief wait until the desired data on the track moves under the read/write head. Once data has been found, data transfer, the transfer of data between memory and the place on the disk track, occurs.|
| || ||Access time is usually measured in milliseconds (ms). The data transfer rate, which tells how fast data can be transferred once it has been found, is usually stated in terms of megabytes of data per second.|
| || ||Batch processing is a technique in which transactions are collected into groups, or batches, to be processed at a time when the computer has few online users and thus is more accessible. A master file is a semipermanent set of records. A transaction file, sorted by key, contains all changes to be made to the master file: additions, deletions, and revisions.|
| || ||Transaction processing is a technique of processing transactions in any order they occur. Real-time processing means that a transaction is processed fast enough for the result to come back and be acted upon right away. Online processing means that the terminals must be connected directly to the computer.|
- If you were buying a personal computer today, what would you expect to find as standard secondary storage? What storage might you choose as an option?
- Can you imagine new multimedia applications that take advantage of sound, photos, art, and perhaps video?
- Provide your own example to illustrate how characters of data are organized into fields, records, files, and (perhaps) databases. If you wish, you may choose one of the following examples: department store, airline reservations, or Internal Revenue Service data.