RLE COPY
i AT ACOMP UTE R PROJECT
SEMI-ANNUAL TECHNICAL REP«PT<
1976 - December 37 1976
Cant* — * H» . ) MDA9.0 3-74-C-0 / 225
' y
r ARP A i Order ^L.2687
Approve lot : r
Di !trib utioa alii
Submitted to:
Defense Advanced Research Projects Agency
1400 Wilson Boulevard
Arlington, Virginia 22209
Computer Corpora/trfbn of America '
575 Technology Square
Cambridge, Massachusetts 02139
DATACOMPUTER PROJECT
SEMI-ANNUAL TECHNICAL REPORT
July 1 , 1976 to December 31 , 1976
This research was supported by the Defense Advanced Research
Projects Agency of the Department of Defense and was moni-
tored by the U.S. Army Research Office, Defense Supply
Service -- Washington under Contract No. MDA903-74-C-0225 . '
The views and conclusions contained in this document are
those of the authors and should not be interpreted as neces-
sarily representing the official policies, either expressed
or implied, of the Defense Advanced Research Projects Agency
or the U.S. Government.
Datacomputer Usage 23
Seismic Usage 23
DFTP 24
IMP statistics 26
SURVEY 27
ACCAT 28
ERDA 29
Message Archiving 30
NSW 30
A
3.9
Accounting
31
Semi-Annual Technical Report
Datacomputer Project
Table of Contents
4 Software Development
4 . 1 Services
4.1.1 Staging
4.1.2 Terabit Memory (TBM) Support . . .
4.1.3 Accounting
4.1.4 Directory Security
4.1.5 Maintenance / Testing
4.2 The Request Handler
4.2.1 Datacomputer Version 2
4.2.2 Datacomputer Version 3
4.2.3 Maintenance and Testing
5 Hardware / Site / Operations . . .
5.1 Site Improvements
5.2 The TBM
5.3 Operations
6 Seismic Data Base Support ....
6 . 1 Overview
6.2 The SIP and Network Considerations
33
33
33 I
35 1
36
37
38
38
'
39
42
46
47
47
47
49
51
51
52
7 Summary
7 . 1 Services
7.1.1 Tertiary Memory Support
7.1.2 Utility Support Programs
7.1.3 Security and Accounting
7.2 Request Handler ....
7.2.1 Data Description . . . .
7.2.2 Data Manipulation . . .
7.2.3 Efficiency
54
55
55
Semi-Annual Technical Report Datacomputer Project
Computer Corporation of America
Chapter 1
Introduction
This report describes our^work, on the Datacomputer
system from July 1, 1976 through December 31, 1976. The
project is supported by the Information Processing Tech-
niques Office of the Advanced Research Projects Agency of
the Department of Defense. The current work is being
carried out under contract MDA903-7 J 4-C-0225 . Related work
discussed herein is supported by the Nuclear Monitoring Re-
search Office of ARPA under contract MDA903-74-0227 .
■> uv, c : --
e*
Work during the reporting period falls into two the
following categories: production operation of Datacomputer
Version 1 ; preparation and release of Datacomputer Version
2, the first version of the Datacomputer to support an Ampex
Terabit Memory System; production operation of Version 2;
preparation and release of Datacomputer Version 3
Chapters 2 - 6 provide detailed descriptions of this
work. Chapter 2 is a discussion of the Datacomputer archi-
tecture, with emphasis on the increasing levels of function-
al abstraction beginning with the hardware and moving
outward. Chapter 3 is a report on the usage of the Datacom-
puter during the reporting period. Chapter 4 is a detailed
discussion of the work on the Datacomputer software.
Page 1
Semi-Annual Technical Report Datacomputer Project
Introduction
Chapter 5 describes Datacomputer site, hardware, and opera-
tions work. Chapter 6 is a brief overview of the NMRO work
and its implications for the Datacomputer in general.
Finally, Chapter 7 presents a brief summary of technical de-
velopment over the duration of this contract.
Page 2
Semi-Annual Technical Report Datacomputer Project
Computer Corporation of America
Chapter 2
System Description
The Datacomputer is a very large scale data storage fa-
cility with substantial data management capabilities. Its
design is optimized for use as a data resource in a network
of large-scale computers which are connected via medium
speed (50,000 bits/second) communications lines. The data
storage functions of the Datacomputer will support the
storage of data sets over a trillion bits, and the hardware
facilities include an expandable Ampex TBM currently config-
ured for 200 billion bits of storage (other mass storage
systems could also be used).
The development of the Datacomputer has been strongly
affected by its nature as a network data utility. Its
design does not preclude very fast data transfers -- the
Datacomputer can feed data to the network or some other in-
terface at speeds approximating the bandwidth of its storage
devices, as long as no special processing is required -- but
this is not how it operates in the most frequent case. The
combination of very large storage capacities and only moder-
(1) An earlier Semi-annual Technical Report contained a
lengthy tutorial section titled "System Description," re-
garding which a number of positive comments have been
received. In the interest of making the present document
self-contained, a somewhat shortened and updated version of
the same material is included here.
Page 3
►
r
Semi-Annual Technical Report
System Description
Datacomputer Project
ately fast communications facilities implies the Datacom-
puter must provide powerful facilities for data selection
and subsetting, in particular, to minimize data transmission
back and forth through the network. (To transmit a trillion
bits through a 50,000 bits/second channel requires about 231
days, assuming no errors or interruptions!) Furthermore, in
order to make simple changes to large numbers of records, a
facility is necessary for self contained requests which
modify files without data transmission over the network.
The Datacomputer ' s existence in a network environment
also implies that other computer systems mediate between the
Datacomputer and its ultimate human users. Consequently,
functions that are not intrinsic to data management, such as
carefully human-engineered terminal interfaces, are rele-
gated to those other systems. This has several benefits:
it allows work at the Datacomputer to concentrate on issues
intrinsic to data storage and management. It allows other
systems which concentrate on human engineering to provide
better interfaces than would otherwise be produced. Most
importantly, it ensures that the Datacomputer is developed
as a resource available to other systems, capable of being
incorporated into larger projects. Careful attention is
paid to such issues as error flagging and resynchronization
after serious error detection.
Page 4
Many large computer systems may be usefully examined in
terms of their functional hierarchies. A level may be char-
acterized in two ways. First, more fundamental operations
which are provided by the previous level (and may already be
abstractions themselves) are combined into new, more power-
ful, and more abstract operations. For example, the stream
of magnetic flux reversals seen by the disk controller
becomes a stream of fixed (or variable) length blocks of
binary words when seen by the operating system. A subset of
this arbitrary collection of unformatted words is presented
to user programs as a "file" in a "file system". Second,
intermediate functions exist to prevent certain combinations
of operations which would damage system integrity from oc-
curring, and to hide other functions entirely from the next
level out.
The term normally used for the particular collection of
functions available to any given level of a system hierarchy
is "virtual machine". In many ways, the programmer working
at level n in such a system may behave as if level n-1 were
hardware; All n-1 functions are immutable and part of the
Page 5
✓
Semi-Annual Technical Report Datacomputer Project
System Description
machine environment. Using terms that will be explained in
the rest of this section, the TENEX implementer programs a
PDP-10; the SV programmer programs a TENEX (which looks a
lot like a PDP-10 with some major abstractions); the Request
Handler programmer programs an SV machine, and the ultimate
user programs a Datacomputer. (We shall see that the set of
functions presented by the Request Handler is equivalent to
the Datacomputer virtual machine.)
The following levels will be discussed in detail:
1) The hardware consists of a Digital Equipment Cor-
poration (DEC) PDP-10 and its supporting peripher-
als including communications links to the ARPA
Network and a very large storage device.
2) The TENEX operating system is in direct control of
the hardware resources and provides many services
to the Datacomputer.
3) The programs known collectively as SV or Services
are a pseudo-operating system which interacts with
TENEX, managing input/output, scheduling, and
storage strategies for Datacomputer files.
4) The Request Handler (RH) is the interface to ex-
ternal processes using the Datacomputer. It
accepts control and data-management statements in
Page 6
Semi-Annual Technical Report Datacomouter Project
System Description
"Datalanguage", provides messages concerning the
state of the Datacomputer job to the user, and
supervises data flow in both directions under the
control of Datalanguage statements, and with the
help of the other levels of the system.
Above these four levels of processing, there are an in-
definite number of levels of functioning, outside the actual
Datacomputer. These are the processes on other machines
which interface to the Datacomputer, and co-operate with it
in the accomplishment of whatever tasks its ultimate users
undertake .
2.2 The Hardware Level
Conceptually, the hardware for a Datacomputer is quite
simple. A processor of some sort is required along with
some form of primary store (e.g., core). In addition, one
needs a very large store (e.g., TBM) and a medium-to-hieh
speed communications port. A great deal of efficiency can
be gained by adding one or more levels of intermediate
storage such as disk.
Page 7
Semi-Annual Technical Report Datacomputer Project
System Description
The hardware base of the Datacomputer as it is current-
ly implemented consists of a processor, an address manning
device, three levels of store, medium and low speed communi-
cations lines, and several I/O devices.
The processor is a Digital Equipment Corporation KA-10
CPU (PDP-10). A Bolt Beranek and Newman "Pager" provides
address translation for all memory references, and (along
with software in TENEX) provides the illusion of a 256K (IK
= 1024) word primary store regardless of the size of the
physical memory.
The real primary store in the current Datacomputer is
336K words of 36 bit core memory. This includes five 16K
DEC ME-10's and two 1 2 8 K STOR-10's from Cambridge Memories,
Inc. PDP-10 characters are typically stored five to a word,
so this is the equivalent of slightly more than a million
characters of memory.
The system has two types of secondary store. Six
spindles of DEC RP02 disk (IBM 2314 equivalent) provide
space for the TENEX file system. These hold about four
million 36 bit words each, for a total of 24 million words.
In addition, four spindles of CalComp 230 disk (IBM 3330
equivalent) are attached to the PDP-10 via a Systems
Concepts SA-10 IBM data channel simulator. These disks each
will store approximately 20 million words of data and are
Page 8
' r
W0
Semi-Annual Technical Report Datacomputer Project
System Description
used for staging devices between the tertiary store and the
PDP-10.
The main data repository of the Datacomputer is its
tertiary store, an Ampex Tera-Bit Memory System also at-
tached through the 3A-10. The TBM is described in detail in
Chapter 5 (Hardware / Site / Operations).
The Datacomputer ' s communications equipment consists of
a connection to an Interface Message Processor (IMP) which
is in turn connected to the Arpanet. The Arpanet connection
is the Datacomputer ' s only channel to the outside world.
All Datacomputer usage consists of messages and data passed
back and forth through this port. Nodes in the Arpanet are
connected by up to four 50,000 bit/second phone lines, and
the combined traffic of all concurrent Datacomputer is
limited by this (except for the special case of usage from
another host connected to the same IMP).
2.3 The Primary Operating System - TENEX
The second level in the Datacomputer ' s functional hier-
archy is the TENEX operating system. An excellent overview
of the nature and facilities of TENEX can be found in
Page 9
I The virtual machine provided by TENEX - a PDP-10 arith-
metic processor with full memory capabilities and file/-
process address space integration - has proved to be satis-
factory for Datacomputer development. However, changes to
'
the TENEX monitor have been necessary to optimize the Data-
computer's performance. The Datacomputer ' s Network Control
Program has been modified to optimize voluminous data trans-
fers rather than the high level of short message (due to
user terminal I/O) that is more typical of TENEX network
traffic. The scheduler was modified to give special consid-
erations to the resource utilization patterns of the Data-
computer. For example, there is little urgency in the Data-
computer to satisfy highly interactive jobs, but it is nec-
essary to respond promptly to events like high-speed disk
operations. Higher throughput with reduced overhead is
achieved by rescheduling at shorter intervals than in a
normal TENEX, but giving larger quanta of CPU time when jobs
are scheduled. Routines to support the Calcomp 230 disk
drives and the TBM have been added.
(1) Bobrow, Daniel G., e_t al . . "TENEX, a Paged Time Sharing
System for the PDP-10, " Communications of the ACM, V. 15,
no. 3, March 1972, pp. 135 - 143.
Page 10
Semi-Annual Technical Report Datacomputer Project
System Description
2.4 The
The preceding two levels of the Datacomputer system
were not products of the development effort being discussed.
They are described here because an understanding of their
functions and capabilities is useful to understanding the
functions and capabilities of the two outer layers of the
system - Services and the Request Handler.
These two levels constitute what could reasonably be
called the "Datacomputer proper", and are the primary output
of the Datacomputer project. They are conceptually and
functionally separate - to the point of having separate per-
sonnel. This section discusses the Services programs (here-
after known interchangeably, and in accordance with time-
honored tradition, as SV ) .
SV functions as a pseudo operating system for the Data-
computer. It provides the basic functions of a traditional
operating system in a form which is maximally convenient for
the construction of a user-level Datacomputer interface (of
which the current Request Handler is but one example). In
particular, SV provides a specialized file system, stream
oriented input/output facilities, and a set of scheduling/-
monitor functions.
Page 1 1
Semi-Annual Technical Report Datacomputer Project
System Description
The special disk area index module, known as SDAX,
serves two critical functions. First, it controls the
staging mechanism which brings required data pages from rel-
atively slow tertiary store (TBM) to relatively fast second-
ary store (3330 and RP02 disk). Pages thus stored are kept
on disk as long as they are in use, and are migrated back to
TBM only after they are no longer needed by active users.
The second function of SDAX is to provide a mechanism for
permitting access to several versions of a file by any
number of users. It does this by a map-chaining technique
which permits multiple readers and a concurrent updater to
access any number of active versions of a file.
Access to SV functions for the Request Handler is via a
special instruction known as "SVCALL". SVCALL's exist to
manipulate the state of Datacomputer files including reading
and writing pages from them; to perform input and output
over the Datacomputer ' s Arpanet connections, and to handle
special error conditions.
2.4.1 The SV File System
The primary function of SV is to provide a convenient
interface to the data storage facilities of the Datacom-
puter: the tertiary store and the staging device. A Data-
Page 12
u
Semi-Annual Technical Report
System Description
Datacomputer Project
computer file as seen by the Request Handler Drogrammer con-
sists of a number of "sections", each of which is an ordered
set of pages. (For convenience, SV pages are the same size
as TENEX pages - 512 36 bit words.)
2 . M . 2 The Directory System
The Datacomputer file system may be thought of as a
tree-structured hierarchy. At the top of the tree is a node
whose conventional name is "/&TOP". There are two types of
nodes in the directory system - terminal and non-terminal .
Terminal nodes (files) contain only data, and non-terminal
nodes (directories) contain other nodes which exist at a
lower level in the tree. Node creation is independent of
the intended use of the node. In other words, a node in the
tree is created, then at a later time it is specified
whether it is terminal (a file) or non-terminal (a direc-
tory). Levels of the hierarchy are specified as a list of
names connected by periods, such as "/STOP .DFTP .CCA" . In the
example, ^TOP and DFTP are non-terminal nodes, and CCA may
be either terminal or non-terminal (in the example, not
enough context is present to determine which).
Page 13
It
Semi-Annual Technical Report Datacomputer Project
System Description
In addition to maintaining the directory hierarchy, the
directory system provides protection for contents of nodes,
whether other nodes or data. This protection takes the form
of a set of "privilege tuples" associated with each node. A
privilege tuple describes two things: the set of privileges
allowed (or denied) to the user accessing the node, and the
specification of the class of users to whom this particular
set of privileges applies.
Just to give the flavor of privilege tuple application,
one tuple might specify that, for a particular node, a user
may login to the node and create new nodes under it, but
only if connected to the Datacomputer from socket number
1000001 on ARPANET host number 31. Another privilege tuple
might grant the same privileges to any user who knows that
the password is "WASHINGTON". A third might only grant read
access to files under that node to users giving the password
"DC". For a full discussion of privilege tuples, please
refer to the latest Datalanguage manual.
This external view of the Datacomputer ' s file system
a tree-structured hierarchy with multiple protection classes
enforced on each node in the tree - is dealt with transpar-
ently by the Request Handler. This means that the structure
seen by, and the functions available to the ultimate Data-
computer user are essentially the same as those provided by
Services to the Request Handler.
Page 14
r
Semi-Annual Technical Report Datacomputer Project
System Description
2.4 .3 A ccess to Datacomputer Files
As mentioned above, a Datacomputer file is stored as a
number of sections, each of which is broken into 512 word
blocks called pages. When the Request Handler wishes to
access some page of a Datacomputer file, the following
sequence of events must take place:
1) The file is opened. To open a file, RH supplies
SV with the string representing the file's path-
name in the Datacomputer file system (along with
any needed passwords). SV determines that the
current user is allowed to access the file in the
manner requested (and the file exists), then
returns a small integer, known as a Relative File
Number or RFN. The RFN is the handle used by RH
in all future references to the file until it is
closed, at which time the RFN becomes invalid.
2) A buffer is allocated in the user process's
address space. Buffers are managed by SV , but
their allocation, freeing, and use is under the
control of RH. A buffer is exactly the same size
as a Datacomputer file page (and of a TENEX Dage).
The buffer is identified by yet another small
integer returned by SV .
Li
Page 15
S'
Semi-Annual Technical Report Datacomputer Project
System Description
3) If the page is being read (data already exists and
is being referenced), an SVCALL known as PGRD is
executed ,
This takes the RFN
of the
file,
the
section
number, the page
number
within
the
section ,
and the buffer number
into which the
page
is to be read as inputs. After the call, the page
is available in the buffer.
4) If the page is being created, data is first
entered into the buffer by the Request Handler,
then the page is written to the file by the SVCALL
PGWR . Arguments are the same as with PGRD.
5) If the page is being modified, the sequence is
PGRD, modify, PGWR
6) When the Request Handler is through with the
buffer and the file, the buffer is released by an
explicit SVCALL, and the file is closed.
2.4.4 The SV Input/Output System and Monitor
input/output and monitor facilities provided by
are fairly simple when compared with the directory
Input/output consists primarily of a set of connec-
the ARPANET, with the ability to read and write
I
The
Services
system .
tions to
Page 16
Semi-Annual Technical Report Datacomputer Project
System Description
buffers of data to/from a given connection. A special set
of SVCALL's are provided for communication with the Datacom-
puter operator's console. The operator is consulted before
particularly large requests are executed for user jobs, and
certain kinds of messages about the state of the Datacom-
puter are routed there.
The Services monitor provides no particular facilities
of its own, but is responsible for the creation/destruction
of TENEX forks which represent particular Datacomputer sub-
jobs. As users contact the Datacomputer via the network,
they are assigned to a particular sub-job by the master
process known as "Job 0", which is just like any other Data-
computer process, except it has the monitor code enabled.
2.5 The User's Level - RH
The "outermost" level of the Datacomputer is known as
the Request Handler. RH is in some sense an application
program, since it is possible for a reasonably naive user to
interact directly with it, via a specialized data-management
language known as "Datalanguage" . It would not be unreason-
able to consider Datalanguage as the Datacomputer ' s order
code .
Page 17
The Datacomputer maintains one or more input/output
channels for the user. These are called "ports". All Data-
language interactions flow over a particular port known as
the "default port" or the "Datalanguage port". This port is
the connection established when the user first contacts the
Datacomputer from the ARPANET. Data may flow over the
default port or over auxiliary ports which are created by
Datalanguage statements as the session progresses. It is
preferable to use auxiliary ports for data for two reasons:
Page 18
Datacomputer Project
4
Semi-Annual Technical Report
System Description
first, only ASCII data may pass through the default port;
and second, even though the data being passed is ASCII, care
must be taken to insure that it contains no characters which
are treated specially when passed through the default port.
Datalanguage statements fall into two categories
commands and requests. In general, commands control the
state of the user's Datacomputer process; open and close
files, create nodes, modify privilege tuples, etc. Requests
refer directly to the contents of files. A large part of
Datalanguage is devoted to the detailed description of the
contents of files, and the Request Handler makes extensive
use of such descriptions in planning its actions.
When the user first connects to the Datacomputer,
Services initializes a new Datacomputer process, then passes
control to the Request Handler. RH does some initialization
of its own, then asks SV for the next line of input from the
Datalanguage port. If the input line is a command, it is
executed immediately. Requests are compiled, then executed.
Page 19
The Request Handler's compiler is invoked for most re-
quests. (A special subset of easy- to-handle requests are
interpreted by a special module known as "SLURP".) The com-
piler consists of three parts.
1) The first phase of compilation is handled by a
routine known as the "pre-compiler". The pre-
compiler takes the request as received from the
user, does validity/syntax checking, and produces
a new representation of the request known as "in-
termediate language". Intermediate language con-
sists of a set of functions which are an abstract
description of the entire set of operations which
are legal on Datacomputer data. These functions
are essentially the low-level machine language of
the Datacomputer. They represent elementary oper-
ations such as "move an item from container 1 to
container 2" with appropriate ancillary informa-
tion such as the type and location of containers 1
and 2. Most of the "smartness" of the Request
Handler lies in the pre-compiler. It is complete-
ly responsible for the syntactic and semantic in-
terpretation of user requests (but not their exe-
cution) .
Page 2-20
ki
M | O’rrrytr ^ . » c . • - : t- - - .
Semi-Annual Technical Report Datacomputer Project
System Description
2) After the pre-compiler has abstracted and simpli-
fied the request, the intermediate language gener-
ated and the descriptions of the real files which
are named in the request are fed to the rest of
the compiler. This section is responsible for
generating the instructions for actually moving
data from one file (or port) to another under the
control of the request. The output of this phase
of the compiler is a data structure which contains
all the messy loops, skips, and such for plowing
through and pulling the data specified in the
format requested from the file. The descriptions
of these operations are called "tuples".
3) Finally, the routines which actually execute the
request on the data are, in some sense, part of
the compiler. Many of the tuples have distinct
sub-routines which are responsible for their exe-
cution, and those routines constitute both the
run-time environment and part of the compile-time
data base of the compiler. Because of the multi-
tude of data-types, byte sizes, etc. allowed by
the Datacomputer , each tuple has many "modes",
which are identified by bits in the data struc-
ture. For any given request, a particular set of
modes is used, and a particular subset of the
tuple code is executed. The last phase of the
Page 2-21
Li
Semi-Annual Technical Report Datacomputer Project
System Description
compiler walks through the tuple list that defines
the request, and extracts the instructions which
perform the tuple functions as constrained by the
active mode bits in the tuples, producing the
final "compiled request", which is executed with
the real data.
i
Page 2-22
Semi-Annual Technical Report Datacomputer Project
Computer Corporation of America
Chapter 3
Datacomputer Usage
The dominant fact in usage of the Datacomputer during
the latter half of 1976 was the release of Version 2, with
its incorporation of the terabit memory system. This ful-
filled the Datacomputer * s promise of very large, cost-
effective on-line storage, and use of the Datacomputer re-
flected that reality. The number of bits stored has in-
creased dramatically; projects which had filled their avail-
able allocations in system development went into production
as the TBM's facilities became available.
3 . 1 Seismic Usage
The seismic application of the Datacomputer is dis-
cussed in more detail in section 6 of this report; at this
point it suffices to sketch the explosive growth in use of
the Datacomputer as this application went on line. Begin-
ning in October, real-time raw seismic data was fed to the
Datacomputer through the Arpanet at rates of 7 - 12 kilo-
baud, around the clock. By the end of the year, nearly 70
billion bits of seismic data -- raw readings, event sum-
Page 23
Semi-Annual Technical Report Datacomputer Project
Datacomputer Usage
maries, instrument status reports, and various historical
files -- had been stored in the Datacomputer. If stored on
conventional disk storage, this volume of data would have
required more than 85 spindles of 100-megabyte drives!
As the seismic database grew, researchers began re-
trievals against it, initially to develoD and test proce-
dures for use in ongoing seismic studies. A special program
was developed at CCA to assist analysis of seismic data by
providing quick graphs of waveforms from the raw data files;
and the RDC program developed earlier at CCA was used exten-
sively in these inquiries. The Applied Seismology Group at
Lincoln Laboratories also developed several systems for ac-
cessing seismic data through its PDP-11 UNIX system, and
were active users of seismic data during this period.
8.2 DFTP
Growth of user activity in other areas of Datacomputer
service was somewhat less dramatic, but equally real. The
most widespread usage continued to be the DFTP system, which
provides a uniform file archival service for various PDP-10
systems throughout the Arpanet. Several significant devel-
opments occurred with DFTP during this period: The new file
Page 2H
pr- /•‘T-r:'
r
Semi-Annual Technical Report Datacomputer Project
Datacomputer Usage
organization discussed in the previous semi-annual report
was implemented, and most user's data was transferred to
that format. Installation of the TBM allowed usage to grow
beyond the artificial bounds imposed by disk space available
in earlier versions of the Datacomputer. A new implementa-
tion effort extended the class of operating systems on which
DFTP is available, and thereby its domain of users.
The new format of DFTP storage collects many files for
a particular user into a single Datacomputer file. The
space savings in the scheme are impressive, but relatively
unimportant (averaging around 60^ for most users). More
significant is the reduction in Datacomputer directory over-
head and in TBM accessing which goes along with this compac-
ter storage. The new format also provide users with addi-
tional directory information and integrity features, and in-
corporates several user features which make it considerably
more convenient & effective. By year-end, all but three ex-
isting sites had had their user's data transferred, and the
new version of the program made available to users. The re-
maining sites were delayed by coordination difficulties, but
all were transferred before this report was written.
panded
to f i
11 their
a 1-
reased
space
when the
TBM
signi
f icant
increase
in
DFTP;
as a
result ,
file
Page 25
Semi-Annual Technical Report Datacomputer Project
Datacomputer Usage
storage rose from about 850 to about 1300 megabits in the
second half of the year, and was increasing steadily.
Concurrently with the release of the new DFTP to old
users, versions of the program were made available to users
of the ITS, SAIL, and TOPS-20 operating systems. This pro-
vided complete coverage of the PDP-10 family of systems on
the network; efforts were underway to spread DFTP to
MULTICS and UNIX systems during the period.
In addition to its primary function as an effective
file storage resource for members of the Arpanet community,
DFTP served as an example system for groups investigating
means of accessing the Datacomputer; these included re-
searchers in Facsimile message processing at the University
of London, researchers from several ERDA laboratories, and
the UCLA-Security research group.
3.3 IMP statistics
The Arpanet Network Control Center has been an estab-
lished user of the Datacomputer, storing statistics on per-
formance of the IMPs which implement the network. This ap-
plication continued unabated, and artificial limits on the
amount of data stored were removed with installation of the
;
i
Page 26
Semi-Annual Technical Report Datacomputer Project
Datacomputer Usage
TBM. Usage has grown from 200 to 600 megabits during the
period of this report, with steady retrieval activity
against the data. The retrieval programs written at BBN in-
corporate a package of Datacomputer interface subroutines
(DCSUBR) described in the predecessor to this report.
3.4 SURVEY
Another well-established use of the Datacomputer is the
SURVEY application, carried on in conjunction with MIT's
Laboratory for Computer Science. Current survey data on the
status of hosts on the Arpanet continued to be stored at the
rate of about 10,000 probes per day. Efforts also began to
restore SURVEY data which had been migrated off-line due to
space considerations in old versions of the Datacomputer.
By year-end, all four quarters' data for 1976 were restored,
raising usage from 350 megabits to about 600.
The SURVEY database was used during this period by re-
searchers in the Very Large Data Base project at MIT. They
used the Datacomputer ' s processing of requests against 2
quarters' worth of SURVEY data to test their work in estima-
ting the cost of query processing in very large databases.
This usage involved approximately 300 more megabits of
storage .
Datacomputer Project
I
t •
£ ’
i
? ■!
Semi-Annual Technical Report
Datacomputer Usage
7.5 ACCAT
The ARPA Command and Control Advanced Testbed, being
undertaken at the Navy Electronics Center Laboratories in
San Diego, became a major focus of interest in the Datacom-
puter in this period. Initial investigations begun at the
Stanford Research Institute in 1975 were continued, with
further demonstrations of the system which interfaced a
natural language query processor to the Datacomputer. The
conversion of the database which supports this system to a
relational format was begun, and steps were also undertaken
to load a much larger command and control database into a
version of the Datacomputer to be brought up at NELC. Data-
computer personnel provided consultation and programming
support for this conversion and loading task; previously
implemented user software (like the DCSUBR package) proved
valuable in this effort. Major research in distributed data
management will be carried out under a separate contract,
using the Datacomputer as the basic DBMS.
Page 28
Datacomputer Project
Semi-Annual Technical Report
Datacomputer Usage
1.6 ER DA
Several of ERDA's national laboratories began investi-
gations of the Datacomputer during the reporting period;
most active were groups at the Argonne and Lawrence Berkeley
Laboratories. The major project in this period was begin-
ning installation of a climatological database by personnel
at Argonne. This database contains 16 files, each with a
year's worth of hourly readings for some U.S. city; the
data are used by a number of sets of programs which model
energy usage in buildings and communities (CAL-ERDA, ATMES,
and ACUC systems).
Interfacing with Argonne personnel provided a test of
the scope of Datacomputer applicability beyond its previous
extent: the database was being transferred from a 370/195
at Argonne, through a Varian network interface, into the
Datacomputer, and later retrieved for processing at
Berkeley's CDC 6600. Despite problems with the network in-
terface at Argonne, this sequence of transfers was success-
fully accomplished with minimal user programming effort and
no modifications to the Datacomputer. It is interesting
that the standard File Transfer Protocol (FTP) programs im-
plemented at Argonne and Berkeley proved sufficient for ac-
cessing the Datacomputer, although they might be replaced by
Page 29
7
Semi-Annual Technical Report Datacomputer Project
Datacomputer Usage
more specialized interface programs at some later date. The
concept of the Datacomputer as a resource for data sharing
thus received additional validation.
.7 Message Arc h iving
Under a separate, short term contract, CCA began in-
vestigations into the use of the Datacomputer for archiving
network messages. Preliminary investigations dealt with in-
terfacing to the various message handling programs now
available on the Arpanet, and designing possible Datacom-
puter implementations which would be consistent with such
systems. Experience gained in implementation and operation
of DFTP proved valuable in this effort.
3.8 NSW
The National Software Works project began work on using
the Datacomputer as the archival system for its historical
databases as storage became available for that purpose.
Initial design relied on existing Datacomputer user support
facilities, such as the DCSUBR package mentioned above. At
Page 30
Semi-Annual Technical Report Datacomputer Project
Datacomputer Usage
the end of the reporting period, consultations were underway
between Datacomputer and NSW personnel on specifics of file
description and accessing characteristics.
A ccountin g
As a final task in the implementation of the Datacom-
puter, a system was implemented for accounting for usage.
There were several interesting problems involved here: It
was necessary to choose a set of parameters which at once
constituted an adequate measure of resource utilization and
could be made accessible to the accounting system with rea-
sonable effort. Given this information, the actual process-
ing of that usage data into meaningful accounts was a rela-
tively straightforward reporting task. There were certain
kinds of reporting, however, which were unique to the Data-
computer's status as a network utility. These involve
access to shared databases. Questions arise as to who is
ultimately responsible for the storage and accessing of
shared data; if charging for these functions is separated,
the "owner" of a database may lose information on how it is
being used, and by whom.
Page 31
Semi-Annual Technical Report Datacomputer Project
Datacomputer Usage
Collection of appropriate usage statistics proved
simple, as the Datacomputer already supplied more than
enough statistics in its normal processing; modifications
to the Datacomputer involved simply formatting the appropri-
ate counters in a separate file, and adding routines to
account for storage and directory space used. This informa-
tion is dumped to a file external to the Datacomputer; from
there several straightforward programs manipulate it to
produce usage reports which can serve as the basis for
billing when needed. Shared databases have their storage
and accessing charges separated; however, advisory access
reports are supplied to the official "owners" of data de-
tailing times, modes and user identities for access to that
data .
Preliminary reports were produced at year-end, and
regular reporting is expected to begin monthly in 1977.
Page 32
Semi-Annual Technical Report Datacomputer Project
Computer Corporation of America
Chapter 4
Software Development
Des
cription of
work on
Datacomputer so
f twa
re will
follow
the
division
between
the Services (SV)
and
Request
Handler
(RH)
sections
of the project introduced
in
Chapter
2 .
4 . 1 Services
A. i
£.
fir/
L.W-
u
The Services work during this reporting period was con-
centrated in five areas. Staging routines, terabit memory
(TBM) support, accounting, directory security and mainten-
ance/testing .
4.1.1 Staging
The design and implementation of the staging area re-
placement algorithms was accomplished during the first part
of the reporting period. As the staging area or its direc-
tory become full these algorithms are used to determine
which files should be removed to free additional space.
Page 33
Datacomputer Project
P
!
(
<)
Wfl
t ,
0
u
Semi-Annual Technical Report
Software Development
File selection is based on the amount of user generated ac-
tivity and the number of modified pages that must be copied
back to TBM before the file can be deleted. Three algo-
rithms were implemented for the Version 2 release. The pri-
mary algorithm involved the Datacomputer monitor (JobO) per-
iodically running a background process to monitor the
staging area and its associated directory table. When space
used exceeded a certain threshold the background process
would begin to migrate updated files back to TBM freeing up
additional space. In the event the background process was
unable to keep up with user demands for staging space, two
overflow algorithms were implemented to enable user subjobs
to free their own space.
From observing the operation of the Version 2 Datacom-
puter, it was discovered that JobO was not always able to
run the staging area monitor process often enough to prevent
staging area overflows. This problem was solved in the
Version 3 Datacomputer by moving the staging area monitor to
a fork inferior to JobO. The staging routines were also
enhanced to support multiple staging disks. This gave the
Version 3 Datacomputer a staging area capacity of up to
120,000 pages. Several new operator commands were imple-
mented to facilitate operator control of the staging area.
These included staging area status and variouscommands to
enable staging disks to be dynamically added or deleted from
Page 34
r
7
Semi-Annual Technical Report Datacomputer Project
Software Development
the Datacomputer. For users with applications requiring
fast access to files, a capability was added to allow files
to be "frozen" in the staging area, A "frozen" file can
remain staged indefinitely, thus avoiding the overhead that
might be caused by re-staging.
H.1.2 Terabit Memory ( TBM ) Support
One utility was modified and several new utilities
written to support TBM operation. The Directory Cross
Checker (XD) is a utility that verifies the internal consis-
tency of the Datacomputer Directory. XD was modified to
support off-line TBM Volumes and to build the enormous bit
patterns required to verify that no two Datacomputer files
are inadvertently mapped to the same TBM block.
Because the blocks on TBM tapes eventually wear out,
two utilities were written to create backup copies of Data-
computer files. The first utility is automatically run by
the Datacomputer every night. It walks the Datacomputer ' s
directory tree and creates backup copies of recently crea-
ted/updated files. The storage maps for all copies of a
file are chained together on the file's directory page. In
the event that a TBM tape is accidentally damaged, the
storage map pointers for all effected files can be simply
Semi-Annual Technical Report Datacomputer Project
Software Development
changed to point to their previous file copies, resulting in
little or no loss of data.
Since space on TBM tapes is never physically deleted, a
utility was written to copy the latest version of all unde-
leted files from one tape to another. The old tape can then
be archived or reused.
A new version of the Datacomputer software will fre-
quently contain directory modifications that make it incom-
patible with older versions. The normal procedure for up-
grading a database for a new software release was to dump
all data and directory information and reload everything
with the new version. With the large amounts of data stored
on TBM this is no longer possible. A utility was implement-
ed to dump and reload only the directory information.
Lastly, TBM tapes are now mountable/demountable per operator
command. This permits very large or seldom used files, or
full TBM tapes, to be stored off-line and mounted only when
needed by specific users.
4_i_L_3 Accounting
Page 36
Semi-Annual Technical Report Datacomputer Project
Software Development
The accounting package provides a facility for monitor-
ing costs incurred by individual users accessing the Data-
computer. Services support for this effort included gener-
ating a history file of dynamic charges (i.e., pages
read/written, cpu time) incurred during user sessions and
implementing a utility routine. This utility is run at the
end of a billing period. It walks the Datacomputer ' s direc-
tory tree and records static charges (i.e., file space, dir-
ectory space) associated with each node in the directory. A
facility to allow the database administrator to mark certain
nodes as billable and to insure that all users are loged in
beneath a billable node was also implemented.
4.1.4 Directory Securit y
A general mechanism to allow deferred privileges was imple-
mented. A deferred privilege is one that is granted or
denied at one level in the directory tree but does not take
effect until the directory tree but does not take effect
until the next deeper level. The immediate result of this
was to prevent users from modifying their own space alloca-
tion limits.
Page 37
r
Semi-Annual Technical Report Datacomputer Project
Software Development
4.1.5 Maintenance / Testing
During this reporting period at least two and occasion-
ally three Datacomputers were being actively supported. The
Version 1 Datacomputer was operational throughout the re-
porting period. It is currently being phased out as the
last of its database is transferred to the Version 3 Data-
computer. A preliminary version of the TBM Datacomputer was
available for limited network access prior to the Version 2
release. The Version 2 Datacomputer became operational in
October 1976.
A great deal of time was spent testing Services code
for both the Version 2 and Version 3 releases. In addition,
the first month of the reporting period was spent testing
and debugging the Datacomputer /TBM interface.
During the second half of the year, the majority of the
work involving the request handler section of the Datacom-
puter system fell into two areas: software development and
system maintenance.
Page 38
J
!
Datacomputer Version 2 was released mid-way through the
reporting period. Although some software development took
place prior to the release of Version 2, the majority of the
(Version 2) effort was directed toward testing and system
integration .
4 . 2 . 1 . 1 Space A1 1 ocation
Users have been given the option of exercising more de-
tailed control over the allocation of space for their files.
The system continues to supply default allocation parameters
that will suffice for most applications. However, with the
advent of the mass memory, the variation in possible file
sizes has passed the point where default allocation parame-
ters will serve all users reasonably.
Allocation of space in the Datacomputer is done in
terms of physical blocks on the mass memory, each of which
holds 1,032,192 bits. This block is the basic allocation
unit, and its size is referred to as an "M", to reflect its
approximation to a megabit. It is also used when the user
specifies an allocation limit, as the ",M=" option of the
Page 39
Semi-Annual Technical Report Datacomputer Project
Software Development
CREATE and MODIFY commands. Output from a LIST command
(^DIRECTORY or % INFORMATION options) describes space allo-
cated, used, or charged, and is now given in Ms correct to 2
decimal places.
Several options have been added to give users more con-
trol over various aspects of the space allocation for their
files. All of these new options have default values which
are compatible with the internal values used by the Version
1 Datacomputer; thus applications that were designed for the
Version 1 system may use default values for the allocation
options in the Version 2 (and 3) Datacomputer. Most new
Datacomputer applications may use default values for these
parameters .
If a file contains variable length containers which may
be updated, it must have the CHAPTER option specified on it.
This causes the base to be broken down into smaller pieces,
called chapters, and thereby eases the problem of expanding
or shrinking records in the middle of a file. It also
causes space to be left unused in each chapter to allow for
possible expansion.
Two parameters are provided to allow the user to
specify the manner in which the file is chaptered: CF
(Chapter Fill level) and CR (Chapter Record count). Both
options specify an integer value. CR is the number of
Page 40
Semi-Annual Technical Report Datacomouter Proiect
Software Development
records to be included in each chapter and CF is the per-
centage of a chapter which is to be filled with data when it
is first written.
The default value for CF is 8 0 ; for CR, it is the
number of records that will fit in a 1 M block, eiven the
average record size calculated from the file description,
and the (default or explicit) value of CF.
The ^DESCRIPTION option of the LIST command will show
these values for a chaptered file as specified or defaulted.
Three new parameters were implemented to control the
space allocated for inversion: IA (uniaue number of
attribute value pairs), ID (inversion density) and II
(inversion increment). The Datacomputer Version 2 User
Guide presents a comprehensive explanation on how and when
to use these options. The combination of default values
for these parameters results in an inversion allocation of 1
M and should suffice for most applications.
The % DESCRI PTION option of the LIST command will list
the values of IA and ID only if they have been specified by
the user; it will list II whether it was explicitly set by
the user or left as a default.
✓
Semi-Annual Technical Report Datacomputer Proiect
Software Development
4 .2 . 1 .2 DIRECT Mode
A new mode option, DIRECT, has been added to the OPEN
and MODE commands. There are now two inversion-related mode
options for these commands: DIRECT and DEFER. Mode options
pertain to the internal manner in which the addition of
records to inverted files is handled by the Datacomputer.
DEFER is appropriate in almost all cases and is now the
default. DIRECT causes the system to update the inversion
data structure for every inverted container’s value as it is
written to the base and should be SDecified only when the
number of records being added to the file is exceedingly
small. DEFER causes the system to batch a group of inverted
values and update the inversion structure ' occasional lv '
during the request. In most cases, this is much more effi-
cient .
4.2.2 Datacomputer V ersion 3
In the second half of the reporting period, develonment
efforts were concentrated on the implementation of new
features for the Version 3 Datacomputer.
Page 42
.... v „, v . . , rn.
Semi-Annual Technical Report Datacomputer Project
Software Development
4 . 2 . 2 . 1 File Group s
Under support from a related contract from the ARPA
Nuclear Monitoring Research Office, the File Group feature
was implemented and made available to all Datacomouter users
with the release of Version 3.
Modifications were made to all levels of the compiler
and the command handler to support this feature.
The CREATE command was expanded to support the creation
of a group node. A group node itself is a special kind of
file which defines the members of the group. Two options
were added to the description mechanism for files. The
logical constraint option ( , LC= <boolean> ) allows the system
to determine which subfiles of a sroup should be accessed
during the execution of a request. The automatic include
option ( ,GROUP=<group-pathname> ) causes the file to be in-
cluded as a subfile of the named group at creation time.
The user must take the initiative in defining and main-
taining a group's domain using three new commands. INCLUDE
causes the system to include a file as a subfile in a group.
At INCLUDE time, the user may also specify a logical con-
straint. EXCLUDE causes a subfile to be marked as deleted
from the domain of a group. COMPRESS causes the system to
Page 43
garbage collect the group's domain, removing all subfiles
marked as deleted.
In support of these three new commands, a genera] mech-
- I
anism for internal requests was implemented. An internal
request is a reauest initiated by the Datacomouter itself
rather than a user. INCLUDE results in an internal request
to append a record to the group's list of members describing
the newly included subfile. EXCLUDE results in an internal
request to update this list, marking the subfile as deleted.
COMPRESS results in the list being rewritten, skiDDing all
records marked as deleted.
1
J
■ I
i-
I '
i ■
U
A new option, ^DOMAIN, was added to the LIST command.
This option causes the system to read the list of the files
in the group (its "domain"), and output the names and
logical constraints of all non-excluded subfiles.
The pre-comoiler, compiler and code generator were mod-
ified to handle requests run on groups. Essentially, a loop
is generated to read the group's domain. For each non-
excluded subfile in the group, an analysis is performed on
the subfile's logical constraint and the reauest Qualifica-
tion to determine if the subfile must be accessed. If it is
determined that the subfile should not be accessed, the
'inner-loop' which would process the subfile is skipped and
the next subfile is considered.
Page
Semi-Annual Technical Report Datacomouter Project
Software Development
4 . 2 . 2 . 2 F unctions
Four special arithmetic functions were added to Date-
language and made available to all Datacomputer users with
the release of Version 3. Each of these functions may be
used semantically as an exDression or on the left hand side
of a relation.
GCDIST returns the great circle distance from position
A to position B accurate to the nearest nautical mile.
BEARING returns the great circle bearing from position A to
position B accurate to the nearest degree. RLDIST returns
the rhumb line distance from position A to position B accur-
ate to the nearest nautical mile. COURSE returns the rhumb
line course from position A to position B accurate to the
nearest nautical mile.
4 . 2 . 2 . 2 Priority
A comprehensive priority mechanism is planned for a
future version of the Datacomputer. The Datalanguage to
support this mechanism was implemented during the reporting
period .
Page 45
unless overridden.
A new option was added to the CREATE (node) and MODIFY
commands ( , P = <integer> ) . Its function is to set the prior-
ity limit at a node, similar to the allocation limit option.
The PRIORITY command was implemented. Its function is
to override the system default priority level for a session,
up to a user's priority limit.
A new privilege option was added to the CREATEP command
( ,Q = <integer>) . Its function is to override the system
default Driority level when matched, without the need for
I
issuing a PRIORITY command.
4.2.1 Maintenance and Testing
Due to the fact that two versions of the Datacomputer
were released during this reporting Deriod, a large amount
of time was spent on testing. Our catalogue of test decks
was expanded to insure comprehensive testing of old and new
features. All of the test decks were run twice, once in
August prior to the release of Version 2 and again in Decem-
ber prior to the release of Version 3.
Page 46
FT r
a *
> :
n
i •
j|
•*
il . ,
Semi-Annual Technical Report Datacomputer Project
Computer Corporation of America
Chapter 5
Hardware / Site / Operations
During the first half of 1976, an Ampex Tera-Bit Memory
system (TBM) was installed at CCA. After extensive tests
during the second half of 1976, the TBM was accepted.
6.1 Site Improvements
Major site improvements were made to accommodate the
TBM near the end of 1975. An additional air conditioning-
unit became operational in January 1976 and in February the
TBM hardware was delivered.
5.2 The TBM
The TBM system at CCA is a 200 billion bit configura-
tion with 50 billion bits per drive and four drives. The
follow ins figure shows the general structure. The TBM’s
transfer rate is 6 million bits per second, similar to an
IBM 3330 type disk. A high speed seek from one end of a
tape to the other takes approximately 45 seconds.
Page 47
mi
Semi-Annual Technical Report DatacomDuter Project
Hardware / Site / Operations
By March 1976 data had been transfered to and from the
Ampex TBM but Ampex continued to work on ironing out diffi-
culties in the TBM at CCA through the end of June. Agree-
ment on a mutually acceptable detailed test plan was reached
in early June and formal acceptance testing started in July.
The hardware performed well during this test but some diffi-
culties were encountered with the internal TBM software par-
ticularly in the CIU (Channel Interface Unit, a PDP-11/05)
and the SCP (System Control Processor, a PDP-11/35). A
further software test in August demonstrated that these
problems had been corrected and the system was accented.
5.3 Operations
The Version 1 Datacomputer as described in its User
Manual was operational usine three 3330 type disk spindles
for storage and provided service over the Arpanet at the
start of this reporting period. When the Version 2 Datacom-
puter was made publicly available, using the TBM, some in-
formation was transfered into it from the operational
Version 1 Datacomputer and the Version 1 Datacomputer was
squeezed down to two 3330 disk spindles to provide staerint?
room for Version 2.
Semi-Annual Technical Report Datacomputer Project
Hardware / Site / Operations
The Version 2 Datacomputer was provided experimentally
starting in June 1976 using a 2314 type disk as staging .
Starting in October, 1976, Version 2 using 3330 type disks
for staging and TBM tape for storage became the standard
Datacomputer .
At the end of 1976, the Version 3 DatacomDuter , with
the enhancements described above, became available over the
network .
-■f uy w.m 1 rr*
^ "-V' -
—
- r*.w
Semi-Annual Technical Report Datacomputer Project
Computer Corporation of America
Chapter 6
Seismic Data Base Support
As mentioned in the Introduction, some work on the
Datacomputer is funded under a separate contract from the
Nuclear Monitoring Research Office (NMRO) of ARPA. A short
discussion of this work is included here because of its in-
timate relation to the work of the primary Datacomputer de-
velopment contract.
6 . 1 Overview
This work is directed toward establishing an on-line,
real-time data base of seismic information from a world-wide
network of monitoring sites and toward making this data
available to computers for seismic analysis and other
purposes .
Since the system will work in real time and some of the
data will be retrieved by computers on the Arpanet, the
Arpanet was chosen as the most appropriate communications
medium available for the entire system. Seismic data is
collected at the Seismic Data Analysis Center from sensors
at various locations around the world. The data is then
Page 51
‘1
Li
✓
Semi-Annual Technical Report Datacomputer Project
Seismic Data Base Support
transmitted from the SDAC Control and Communications Pro-
cessor to CCA over the network. At CCA a small highly reli-
able computer known as the Seismic Input Processor, or SIP,
absorbs the incoming data and stores it in a disk buffer.
Periodically, the SIP connects to the Datacomputer (again
via the network) and bursts the collected data into the
Datacomputer at a very high rate. A limited amount of data
from sources that do not report in real time will also be
sent to the Datacomputer directly.
6.2 The SIP and N e twork Considerations
The SIP became operational during the first half of
1976. In the later part of 1976, NORSAR and LASA data were
being successfully processed through the SIP.
Observation of network behavior while real time seismic
data is being sent from SDAC to the SIP and older seismic
data is being burst from the SIP to the Datacomputer leads
to the conclusion that the current network capacity at CCA
is marginal for this application. The difficulty is due to
limited processor power and very limited reassembly buffers
in the 516 IMP. Some relief was gained last year when the
316 TIP at CCA was replaced with a 516 IMP to increase the
Page 52
' -T*« ^
Semi-Annual Technical Report Datacomputer Project
Seismic Data Base Support
bandwidth of CCA's network node. Temporary relief was also
provided by moving the Lincoln Laboratories VDH that had
terminated at CCA to another IMP. However considering the
much higher loads planned in the future, the most appropri-
ate long run solution to this problem appears to be the in-
stallation of a properly configured PLURIBUS IMP at CCA.
Action has been taken to accomplish this and installation is
expected to take place in 1977.
Page 53
Semi-Annual Technical Report Datacomputer Project
Computer Corporation of America
Chapter 7
Summary
Over the course of this contract the Datacomputer has
grown from a very basic system into a sophisticated large
scale data storage facility with substantial data management
capabilities. At the beginning of this contract the Data-
computer supported only a limited amount of storage and a
restricted set of user functions. Data description and man-
ipulation facilities were limited. Today the system pro-
vides facilities for data sharing among dissimilar machines
on the Arpanet, rapid access to large on-line files, storage
economy through shared use of tertiary store and improved
access control.
Internally, all Datacomputer modules fall into one of
two principle subsystems, which are called "Services" and
"Request Handler". The Request Handler is the "outer" layer
of the Datacomputer. It provides the interface to the Data-
computer user, parses and compiles Datalanguage , formats and
interprets user data, and is in general control of overall
Datacomputer functions. Services modules provide the lower-
level functions internal to the Datacomputer itself, such as
device and network I/O control, core and disk buffer manage-
ment, file directory services, and operating system inter-
face .
Page 54
r
Semi-Annual Technical Report
Summary
Datacomputer Project
>■
7.1 Service s
When this contract began, the Services subsystem of-
fered rudimentary capabilities. Many of the then-existing
modules have been extensively reworked, and several major
modules have been added. Major changes or additions have
been made in the areas of tertiary memory I/O functions,
testing and support utilities, file security and accounting.
Each of these facilities is described in more detail below.
7.1.1 Tertiary Memory Support
The most important addition to Services is the support
system for the tertiary memory. This has required implemen-
tation of the data-staging mechanism known as SDAX, a gener-
alized data-page moving mechanism ("slosher") capable of
moving large amounts of data between TBM's, 3330-type disks,
RP02-type disks and core. SDAX reduces I/O wait times for
users who have data stored on tertiary memory by staging
(portions of) file data onto dedicated 3330-type disks. The
user's file data operations are then performed on the staged
Page 55
. ... ... ... . ..
- ' .
Jr
_ w 1
Semi-Annual Technical Report
Summary
Datacomputer Project
data and at an appropriate time modified data is written
back to tertiary memory or simply deleted from disk. These
operations are completely transparent to the Datacomputer
user. Implementation of SDAX and slosher routines required
extensive modifications throughout many other Services
modules, including the directory tree, error and core-
management handlers.
7.1.2 Utility Support Programs
In addition and as an enhancement to tertiary memory
device support, a number of utility functions were added to
the Datacomputer. The most important of these are:
The directory cross-checker, which exhaustively
examines the internal consistency of the file dir-
ectory ;
Backup and reload utilities for Datacomputer
files,
which
6,000
Page 56
T
Semi-Annual Technical Report
Summa ry
Datacomputer Project
Extensive testing facilities, including a "Q-
tester" which simulates the activity of several
concurrent network Datacomputer users in creating
and deleting nodes and files, and reading and
writing patterned data to these files;
Greatly improved error-handling facilities. The
Datacomputer is able to recover and continue from a
very large number of potential errors, especially
including device and network I/O problems.
.1.3 Security and Accountin g
Two other major areas of work have been the addition of
a sophisticated security subsystem and inclusion of an ac-
counting package. Datacomputer security is provided by a
privilege block and password system which operates at every
level of the directory tree, is user-settable for all nodes
under that user's control, and which offers per-user acces-
sability to any given file. The accounting package offers
a facility for prorating costs of Datacomputer operation,
data storage and network interface to individual sites ac-
cessing the Datacomputer.
Page 57
Semi-Annual Technical Report
Summary
Datacomputer Project
7.2 R eques t Handler
Key results of the Request Handler development effort
have fallen into the following categories: data descrip-
tion, data manipulation, and efficiency enhancing mechan-
isms. The following subsections will consider each of these
issues in turn.
7.2.1 Data Descript i on
Data is stored by the Datacomputer in FILES whose
contents are described in a directory maintained by the
system. Data is transmitted to or from the Datacomputer
through PORTs which are also described in the system direc-
tory. The Datacomputer ' s role as a central point for data
in a heterogeneous network gives it the rather unusual re-
quirement of being able to deal with a large variety of data
types and machine representations of data. The data des-
cription facilities were greatly expanded to handle strings
whose character set is EBCDIC or BCD as well as ASCII; one's
complement, two's complement and unsigned binary integers;
and non-binary integers (signed as well as unsigned octal,
decimal and hexadecimal).
Page 58
Semi-Annual Technical Report Datacomputer Project
Summary
A full set of description options was implemented. The
byte size option specifies the number of bits in each byte
of the container. The fill character option specifies which
value the system should use to fill the container if no data
is assigned to it or to pad data which does not fill the
container to its minimum size. A terminator tells the Data-
computer where to find the end of data in a variable-length
container. There are three terminator options: count, de-
limiter and punctuation. Virtual data is data that is not
stored within the file, but is system maintained and access-
ible by the user. No space is allocated for a container
having a virtual option specified; the system supplies a
value which may be retrieved and used like the value of a
'normal' container in assignments and expressions. The
Datacomputer supports two virtual options, virtual expres-
sions and virtual indices. A virtual expression may contain
any arithmetic or string operators on constants or other
containers within the same file. When referenced the system
calculates the value of the expression. A virtual index
supplies the position of the closest list (i.e., the record
number at the outermost level) and may be used in any ex-
pression .
Page 59
Semi-Annual Technical Report Datacomputer Project
Summa ry
7.2.2 Data Manipulation
i
I
f
I
4
]
I
■
M
;
i
:
I
From the user's point of view the Datacomputer is a
remotely-located utility. It would be impractical to use
such a utility if, whenever the user wanted to access or
change any portion of his file, the entire file had to be
transmitted to him. Accordingly, Datalanguage has been
expanded to be self contained. The user sends a 'request',
which causes the proper functions to be executed at the
Datacomputer without requiring entire files to be shipped
back and forth. Requests are composed of one or more state-
ments which transfer data (assignment statement); group
statements so that two or more statements are treated as a
single statement (BEGIN-END block); declare local variables
(DECLARE statement); selectively execute statements (the IF-
THEN-ELSE statement and the UNTIL loop); selectively
receive, transfer, and transmit data (FOR and APPEND loops);
update containers (UPDATE loop); delete records (REMOVE
statement); and print messages (COMMENT, ERROR, ALERT,
ABORT, QUIT statements).
The first implementation of the UPDATE loop restricted
the containers which could be changed to fixed-length ones.
Variable-length containers are typically the best choice in
terms of storage efficiency and ease of data handling where
Page 60
J
I
ij. Iiwppifi mmm
_
’ _
Semi-Annual Technical Report Datacomouter Project
Summary
strings of characters (such as names and addresses) are in-
volved. With these considerations in mind, the CHAPTERed
file me hanism was implemented. Chaptering is a technique
which divides a file into parts, called chapters, to facili-
tate the insertion and deletion of data. Each chapter can
be independently expanded or contracted to accommodate
shrinking or growing records.
In order to provide a full set of computational capa-
bilities, arbitrarily complex arithmetic, Boolean and string
expressions are handled by the Datacomputer . In addition to
the representation of different data types a full set of
conversions from one type to another must be available so
that assignment, arithmetic and comparison operations across
data types are possible.
7.2.7 Ef ficiency
1
4
Although often not as visible to the user as Data-
language, efficiency considerations during the manipulation
of data are an important aspect of datamanagement systems,
especially when dealing with large volumes of data.
Page 61
\
I
I
Li
Semi-Annual Technical Report Datacomputer Project
Summary
For very large files, it is desirable to break up the
data into physically smaller units so that each subfile or
component of the group can be individually accessed, check-
pointed, dumped, validated, etc. File groups were imple-
mented to give the user control over how to break up the
data into physically smaller more manageable units, those
that are online or offline, etc. The user may also specify
a logical constraint on each member of a group. 'Logical
constraint' is a Boolean which limits the records in one
subfile to having specific values in specified container(s)
of the file. The system responds to each reference to a
group's name by taking the appropriate action to handle a
set of subfiles. The logical constraint is a very important
efficiency consideration. When looping on a group without a
qualification (WITH <Boolean>) each subfile would have to be
opened and operated on. With an appropriate Qualification
(one based on the logical constraint), the system need only
reference those subfiles selected. An analysis of the
subfile's logical constraint and the request qualification
is performed. If the analysis determines that no record of
the subfile would be selected by the request qualification,
the subfile is skipped. Otherwise the subfile is accepted
for processing.
Page 62
Datacomputer Project
Semi-Annual Technical Report
Summary
An inversion is a secondary data structure that the
Datacomputer can use to improve its efficiency in retrieving
data by content from a file. Specifically, an entry in the
inversion is constructed for every container with the
inversion option. For each data value which occurs for the
container, the inversion contains pointers to all the outer-
most list members for which that container has that value.
When an inverted file is created, storage space is allocated
for the secondary data structure. Although fixed-length
inverted strings were previously supported by the Datacom-
puter, significant improvements have been made. Indirect
inversion was implemented, permitting the retrieval of
outermost list members based on the values of inverted con-
tainers in any inner list. All elementary containers, in-
cluding variable-length strings, may be inverted. An
inversion is not only automatically constructed by the Data-
computer when the file is loaded, it is also automatically
maintained when data is appended or updated.
Complex Boolean expressions, those involving several
comparisons, fall into three classes: those with all com-
parisons evaluate from the inversion, those containing no
comparisons evaluable from the inversion, and those which
mix the two kinds of comparisons. The first two classes
pose no problems; the Datacomputer will use the inversion to
evaluate expressions in the first category, and not for ex-
Semi-Annual Technical Report Datacomputer Project
Summary
pressions in the second category. For those Booleans in the
third category, the Datacomputer decomposes the expression
into two Booleans. The first, based on inverted compari-
sons, will be used to retrieve the data. The second, know
as 'direct search' and based on the non-inverted compari-
sons, will be applied' to only that data selected by the
inverted Boolean.
The container address table (CAT) is used for fast re-
trieval of variable-length list members. It is automatic-
ally created for variable-length list members which contain
at least one inverted container and for chaptered files, but
may be specified as an option on the description of a file.
The CAT provides quick access to list elements and is pri-
marily a tool for increasing efficiency at run-time. For
example, to obtain the nth element of a list of variable-
length, delimited strings with no CAT would require reading
through the first n-1 elements, searching for delimiters.
If the same list had a CAT, obtaining the nth element would
require only loading a pointer from the nth CAT slot.
Virtual indices are treated as a pseudo inversion, as
if it were an inverted container having the list member's
record number as data, although no auxiliary data struc-
ture is maintained.