Wednesday, April 22, 2009

File Structure

To learn about files, we need to understand basic terms used to describe the file hierarchy. The terms we shall cover are byte, data item, record, file and data base.

Byte. A byte is an arbitrary set of eight bits that represent a character. It is the smallest addressable unit in today’s computers.

Data item (element). One or more bytes are combined into a data item to describe an attribute of an object. For example, if the object is an employee.

One attribute may be sex, name, age, or social security number. A data item is sometimes referred to as a field. A field is actually a physical space on tape or disk, whereas a data item is the data stored in the field.

Record. The data items related to an object are combined into a record. A hospital patient (object has a record with his/her name, address, health insurance policy, and next of kin. Each record has a unique key or ID -number. The patient's tag number, insurant policy number, or a unique number could be used as an identifier for processing the record.

In record design, we distinguish between logical and physical records. A logical record maintains a logical relationship among all the data items in the record. It is the way the program or user sees the data. In contrast, a physical record is the way data are recorded on a storage medium. The programmer does not know about the physical "map" on the disk. The software presents the logical records in the required sequence. This capability is unique to data base design.

File. A collection of related records makes up a file. The size of a file is limited by the size of memory/or the storage medium. Two characteristics determine how files are organized: activity and volatility. File activity specifies the percentage of actual records processed in a single run. If a small percentage of records is accessed at any given time, the file should be organized on disk for direct access. In contrast, if a fair percentage of records is affected regularly, then storing the file on tape would be more efficient and less costly. File volatility addresses the properties of record changes. File records with substantial changes are highly volatile, meaning that disk design would be more efficient than tape. Think of the airline reservation system and the high volatility through cancellations, additions, and other transactions compared to the traditional payroll, which is relatively dormant. The higher the-volatility, the more attractive is disk design.

Data base. The highest level in the hierarchy is the data base. It is a set of interrelated files for real-time processing. It contains the necessary data for problem solving and can be used by several users accessing data concurrently. Data bases are covered later in the chapter.

No comments:

Post a Comment