Skip to main content

Full text of "DTIC ADA037415: Datacomputer Project"

See other formats


RLE COPY 



i AT ACOMP UTE R PROJECT 


SEMI-ANNUAL TECHNICAL REP«PT< 


1976 - December 37 1976 


Cant* — * H» . ) MDA9.0 3-74-C-0 / 225 

' y 

r ARP A i Order ^L.2687 


Approve lot : r 
Di !trib utioa alii 


Submitted to: 

Defense Advanced Research Projects Agency 
1400 Wilson Boulevard 
Arlington, Virginia 22209 







Computer Corpora/trfbn of America ' 
575 Technology Square 
Cambridge, Massachusetts 02139 


DATACOMPUTER PROJECT 


SEMI-ANNUAL TECHNICAL REPORT 


July 1 , 1976 to December 31 , 1976 


This research was supported by the Defense Advanced Research 
Projects Agency of the Department of Defense and was moni- 
tored by the U.S. Army Research Office, Defense Supply 
Service -- Washington under Contract No. MDA903-74-C-0225 . ' 
The views and conclusions contained in this document are 
those of the authors and should not be interpreted as neces- 
sarily representing the official policies, either expressed 
or implied, of the Defense Advanced Research Projects Agency 
or the U.S. Government. 




Datacomputer Usage 23 

Seismic Usage 23 

DFTP 24 

IMP statistics 26 

SURVEY 27 

ACCAT 28 

ERDA 29 

Message Archiving 30 

NSW 30 


A 


3.9 


Accounting 


31 


Semi-Annual Technical Report 


Datacomputer Project 


Table of Contents 

4 Software Development 

4 . 1 Services 

4.1.1 Staging 

4.1.2 Terabit Memory (TBM) Support . . . 

4.1.3 Accounting 

4.1.4 Directory Security 

4.1.5 Maintenance / Testing 

4.2 The Request Handler 

4.2.1 Datacomputer Version 2 

4.2.2 Datacomputer Version 3 

4.2.3 Maintenance and Testing 

5 Hardware / Site / Operations . . . 

5.1 Site Improvements 

5.2 The TBM 

5.3 Operations 

6 Seismic Data Base Support .... 

6 . 1 Overview 

6.2 The SIP and Network Considerations 


33 

33 

33 I 

35 1 

36 

37 

38 

38 
' 

39 
42 

46 

47 
47 
47 
49 

51 

51 

52 


7 Summary 

7 . 1 Services 

7.1.1 Tertiary Memory Support 

7.1.2 Utility Support Programs 

7.1.3 Security and Accounting 
7.2 Request Handler .... 

7.2.1 Data Description . . . . 

7.2.2 Data Manipulation . . . 

7.2.3 Efficiency 


54 

55 
55 








Semi-Annual Technical Report Datacomputer Project 

Computer Corporation of America 


Chapter 1 
Introduction 


This report describes our^work, on the Datacomputer 
system from July 1, 1976 through December 31, 1976. The 
project is supported by the Information Processing Tech- 
niques Office of the Advanced Research Projects Agency of 
the Department of Defense. The current work is being 
carried out under contract MDA903-7 J 4-C-0225 . Related work 
discussed herein is supported by the Nuclear Monitoring Re- 
search Office of ARPA under contract MDA903-74-0227 . 


■> uv, c : -- 




e* 


Work during the reporting period falls into two the 
following categories: production operation of Datacomputer 
Version 1 ; preparation and release of Datacomputer Version 
2, the first version of the Datacomputer to support an Ampex 
Terabit Memory System; production operation of Version 2; 
preparation and release of Datacomputer Version 3 


Chapters 2 - 6 provide detailed descriptions of this 
work. Chapter 2 is a discussion of the Datacomputer archi- 
tecture, with emphasis on the increasing levels of function- 
al abstraction beginning with the hardware and moving 
outward. Chapter 3 is a report on the usage of the Datacom- 
puter during the reporting period. Chapter 4 is a detailed 
discussion of the work on the Datacomputer software. 


Page 1 




Semi-Annual Technical Report Datacomputer Project 

Introduction 


Chapter 5 describes Datacomputer site, hardware, and opera- 
tions work. Chapter 6 is a brief overview of the NMRO work 
and its implications for the Datacomputer in general. 
Finally, Chapter 7 presents a brief summary of technical de- 
velopment over the duration of this contract. 


Page 2 



Semi-Annual Technical Report Datacomputer Project 

Computer Corporation of America 


Chapter 2 


System Description 


The Datacomputer is a very large scale data storage fa- 
cility with substantial data management capabilities. Its 
design is optimized for use as a data resource in a network 
of large-scale computers which are connected via medium 
speed (50,000 bits/second) communications lines. The data 
storage functions of the Datacomputer will support the 
storage of data sets over a trillion bits, and the hardware 
facilities include an expandable Ampex TBM currently config- 
ured for 200 billion bits of storage (other mass storage 
systems could also be used). 


The development of the Datacomputer has been strongly 
affected by its nature as a network data utility. Its 
design does not preclude very fast data transfers -- the 
Datacomputer can feed data to the network or some other in- 
terface at speeds approximating the bandwidth of its storage 
devices, as long as no special processing is required -- but 
this is not how it operates in the most frequent case. The 
combination of very large storage capacities and only moder- 

(1) An earlier Semi-annual Technical Report contained a 
lengthy tutorial section titled "System Description," re- 
garding which a number of positive comments have been 
received. In the interest of making the present document 
self-contained, a somewhat shortened and updated version of 
the same material is included here. 


Page 3 





► 



r 


Semi-Annual Technical Report 

System Description 


Datacomputer Project 



ately fast communications facilities implies the Datacom- 
puter must provide powerful facilities for data selection 
and subsetting, in particular, to minimize data transmission 
back and forth through the network. (To transmit a trillion 
bits through a 50,000 bits/second channel requires about 231 
days, assuming no errors or interruptions!) Furthermore, in 
order to make simple changes to large numbers of records, a 
facility is necessary for self contained requests which 
modify files without data transmission over the network. 

The Datacomputer ' s existence in a network environment 
also implies that other computer systems mediate between the 
Datacomputer and its ultimate human users. Consequently, 
functions that are not intrinsic to data management, such as 
carefully human-engineered terminal interfaces, are rele- 
gated to those other systems. This has several benefits: 
it allows work at the Datacomputer to concentrate on issues 
intrinsic to data storage and management. It allows other 
systems which concentrate on human engineering to provide 
better interfaces than would otherwise be produced. Most 
importantly, it ensures that the Datacomputer is developed 
as a resource available to other systems, capable of being 
incorporated into larger projects. Careful attention is 
paid to such issues as error flagging and resynchronization 
after serious error detection. 



Page 4 






Many large computer systems may be usefully examined in 
terms of their functional hierarchies. A level may be char- 
acterized in two ways. First, more fundamental operations 
which are provided by the previous level (and may already be 
abstractions themselves) are combined into new, more power- 
ful, and more abstract operations. For example, the stream 
of magnetic flux reversals seen by the disk controller 
becomes a stream of fixed (or variable) length blocks of 
binary words when seen by the operating system. A subset of 
this arbitrary collection of unformatted words is presented 
to user programs as a "file" in a "file system". Second, 
intermediate functions exist to prevent certain combinations 
of operations which would damage system integrity from oc- 
curring, and to hide other functions entirely from the next 
level out. 

The term normally used for the particular collection of 
functions available to any given level of a system hierarchy 
is "virtual machine". In many ways, the programmer working 
at level n in such a system may behave as if level n-1 were 
hardware; All n-1 functions are immutable and part of the 

Page 5 



✓ 


Semi-Annual Technical Report Datacomputer Project 

System Description 

machine environment. Using terms that will be explained in 
the rest of this section, the TENEX implementer programs a 
PDP-10; the SV programmer programs a TENEX (which looks a 
lot like a PDP-10 with some major abstractions); the Request 
Handler programmer programs an SV machine, and the ultimate 
user programs a Datacomputer. (We shall see that the set of 
functions presented by the Request Handler is equivalent to 
the Datacomputer virtual machine.) 

The following levels will be discussed in detail: 

1) The hardware consists of a Digital Equipment Cor- 
poration (DEC) PDP-10 and its supporting peripher- 
als including communications links to the ARPA 
Network and a very large storage device. 

2) The TENEX operating system is in direct control of 
the hardware resources and provides many services 
to the Datacomputer. 

3) The programs known collectively as SV or Services 

are a pseudo-operating system which interacts with 
TENEX, managing input/output, scheduling, and 

storage strategies for Datacomputer files. 

4) The Request Handler (RH) is the interface to ex- 
ternal processes using the Datacomputer. It 

accepts control and data-management statements in 


Page 6 



Semi-Annual Technical Report Datacomouter Project 

System Description 

"Datalanguage", provides messages concerning the 
state of the Datacomputer job to the user, and 
supervises data flow in both directions under the 
control of Datalanguage statements, and with the 
help of the other levels of the system. 

Above these four levels of processing, there are an in- 
definite number of levels of functioning, outside the actual 
Datacomputer. These are the processes on other machines 
which interface to the Datacomputer, and co-operate with it 
in the accomplishment of whatever tasks its ultimate users 
undertake . 


2.2 The Hardware Level 


Conceptually, the hardware for a Datacomputer is quite 
simple. A processor of some sort is required along with 
some form of primary store (e.g., core). In addition, one 
needs a very large store (e.g., TBM) and a medium-to-hieh 
speed communications port. A great deal of efficiency can 
be gained by adding one or more levels of intermediate 
storage such as disk. 


Page 7 


Semi-Annual Technical Report Datacomputer Project 

System Description 

The hardware base of the Datacomputer as it is current- 
ly implemented consists of a processor, an address manning 
device, three levels of store, medium and low speed communi- 
cations lines, and several I/O devices. 

The processor is a Digital Equipment Corporation KA-10 
CPU (PDP-10). A Bolt Beranek and Newman "Pager" provides 
address translation for all memory references, and (along 
with software in TENEX) provides the illusion of a 256K (IK 
= 1024) word primary store regardless of the size of the 
physical memory. 

The real primary store in the current Datacomputer is 
336K words of 36 bit core memory. This includes five 16K 
DEC ME-10's and two 1 2 8 K STOR-10's from Cambridge Memories, 
Inc. PDP-10 characters are typically stored five to a word, 
so this is the equivalent of slightly more than a million 
characters of memory. 

The system has two types of secondary store. Six 
spindles of DEC RP02 disk (IBM 2314 equivalent) provide 
space for the TENEX file system. These hold about four 
million 36 bit words each, for a total of 24 million words. 
In addition, four spindles of CalComp 230 disk (IBM 3330 
equivalent) are attached to the PDP-10 via a Systems 
Concepts SA-10 IBM data channel simulator. These disks each 
will store approximately 20 million words of data and are 


Page 8 


' r 


W0 



Semi-Annual Technical Report Datacomputer Project 

System Description 

used for staging devices between the tertiary store and the 
PDP-10. 


The main data repository of the Datacomputer is its 
tertiary store, an Ampex Tera-Bit Memory System also at- 
tached through the 3A-10. The TBM is described in detail in 
Chapter 5 (Hardware / Site / Operations). 

The Datacomputer ' s communications equipment consists of 
a connection to an Interface Message Processor (IMP) which 
is in turn connected to the Arpanet. The Arpanet connection 
is the Datacomputer ' s only channel to the outside world. 
All Datacomputer usage consists of messages and data passed 
back and forth through this port. Nodes in the Arpanet are 
connected by up to four 50,000 bit/second phone lines, and 
the combined traffic of all concurrent Datacomputer is 
limited by this (except for the special case of usage from 
another host connected to the same IMP). 


2.3 The Primary Operating System - TENEX 


The second level in the Datacomputer ' s functional hier- 
archy is the TENEX operating system. An excellent overview 
of the nature and facilities of TENEX can be found in 


Page 9 



I The virtual machine provided by TENEX - a PDP-10 arith- 

metic processor with full memory capabilities and file/- 
process address space integration - has proved to be satis- 
factory for Datacomputer development. However, changes to 

' 

the TENEX monitor have been necessary to optimize the Data- 
computer's performance. The Datacomputer ' s Network Control 
Program has been modified to optimize voluminous data trans- 
fers rather than the high level of short message (due to 
user terminal I/O) that is more typical of TENEX network 



traffic. The scheduler was modified to give special consid- 
erations to the resource utilization patterns of the Data- 
computer. For example, there is little urgency in the Data- 
computer to satisfy highly interactive jobs, but it is nec- 
essary to respond promptly to events like high-speed disk 
operations. Higher throughput with reduced overhead is 
achieved by rescheduling at shorter intervals than in a 
normal TENEX, but giving larger quanta of CPU time when jobs 
are scheduled. Routines to support the Calcomp 230 disk 
drives and the TBM have been added. 

(1) Bobrow, Daniel G., e_t al . . "TENEX, a Paged Time Sharing 
System for the PDP-10, " Communications of the ACM, V. 15, 
no. 3, March 1972, pp. 135 - 143. 


Page 10 



Semi-Annual Technical Report Datacomputer Project 

System Description 


2.4 The 


The preceding two levels of the Datacomputer system 
were not products of the development effort being discussed. 
They are described here because an understanding of their 
functions and capabilities is useful to understanding the 
functions and capabilities of the two outer layers of the 
system - Services and the Request Handler. 


These two levels constitute what could reasonably be 
called the "Datacomputer proper", and are the primary output 
of the Datacomputer project. They are conceptually and 
functionally separate - to the point of having separate per- 
sonnel. This section discusses the Services programs (here- 
after known interchangeably, and in accordance with time- 
honored tradition, as SV ) . 


SV functions as a pseudo operating system for the Data- 
computer. It provides the basic functions of a traditional 
operating system in a form which is maximally convenient for 
the construction of a user-level Datacomputer interface (of 
which the current Request Handler is but one example). In 
particular, SV provides a specialized file system, stream 
oriented input/output facilities, and a set of scheduling/- 
monitor functions. 


Page 1 1 


Semi-Annual Technical Report Datacomputer Project 

System Description 

The special disk area index module, known as SDAX, 
serves two critical functions. First, it controls the 
staging mechanism which brings required data pages from rel- 
atively slow tertiary store (TBM) to relatively fast second- 
ary store (3330 and RP02 disk). Pages thus stored are kept 
on disk as long as they are in use, and are migrated back to 
TBM only after they are no longer needed by active users. 
The second function of SDAX is to provide a mechanism for 
permitting access to several versions of a file by any 
number of users. It does this by a map-chaining technique 
which permits multiple readers and a concurrent updater to 
access any number of active versions of a file. 

Access to SV functions for the Request Handler is via a 
special instruction known as "SVCALL". SVCALL's exist to 
manipulate the state of Datacomputer files including reading 
and writing pages from them; to perform input and output 
over the Datacomputer ' s Arpanet connections, and to handle 
special error conditions. 

2.4.1 The SV File System 


The primary function of SV is to provide a convenient 
interface to the data storage facilities of the Datacom- 
puter: the tertiary store and the staging device. A Data- 

Page 12 


u 



Semi-Annual Technical Report 


System Description 


Datacomputer Project 


computer file as seen by the Request Handler Drogrammer con- 
sists of a number of "sections", each of which is an ordered 
set of pages. (For convenience, SV pages are the same size 
as TENEX pages - 512 36 bit words.) 


2 . M . 2 The Directory System 


The Datacomputer file system may be thought of as a 
tree-structured hierarchy. At the top of the tree is a node 
whose conventional name is "/&TOP". There are two types of 
nodes in the directory system - terminal and non-terminal . 
Terminal nodes (files) contain only data, and non-terminal 
nodes (directories) contain other nodes which exist at a 
lower level in the tree. Node creation is independent of 
the intended use of the node. In other words, a node in the 
tree is created, then at a later time it is specified 
whether it is terminal (a file) or non-terminal (a direc- 
tory). Levels of the hierarchy are specified as a list of 
names connected by periods, such as "/STOP .DFTP .CCA" . In the 
example, ^TOP and DFTP are non-terminal nodes, and CCA may 
be either terminal or non-terminal (in the example, not 
enough context is present to determine which). 


Page 13 



It 



Semi-Annual Technical Report Datacomputer Project 

System Description 

In addition to maintaining the directory hierarchy, the 
directory system provides protection for contents of nodes, 
whether other nodes or data. This protection takes the form 
of a set of "privilege tuples" associated with each node. A 




privilege tuple describes two things: the set of privileges 
allowed (or denied) to the user accessing the node, and the 
specification of the class of users to whom this particular 
set of privileges applies. 

Just to give the flavor of privilege tuple application, 
one tuple might specify that, for a particular node, a user 
may login to the node and create new nodes under it, but 
only if connected to the Datacomputer from socket number 
1000001 on ARPANET host number 31. Another privilege tuple 
might grant the same privileges to any user who knows that 
the password is "WASHINGTON". A third might only grant read 
access to files under that node to users giving the password 
"DC". For a full discussion of privilege tuples, please 
refer to the latest Datalanguage manual. 

This external view of the Datacomputer ' s file system 
a tree-structured hierarchy with multiple protection classes 
enforced on each node in the tree - is dealt with transpar- 
ently by the Request Handler. This means that the structure 
seen by, and the functions available to the ultimate Data- 
computer user are essentially the same as those provided by 
Services to the Request Handler. 


Page 14 



r 






Semi-Annual Technical Report Datacomputer Project 

System Description 

2.4 .3 A ccess to Datacomputer Files 

As mentioned above, a Datacomputer file is stored as a 
number of sections, each of which is broken into 512 word 
blocks called pages. When the Request Handler wishes to 
access some page of a Datacomputer file, the following 
sequence of events must take place: 

1) The file is opened. To open a file, RH supplies 
SV with the string representing the file's path- 
name in the Datacomputer file system (along with 
any needed passwords). SV determines that the 
current user is allowed to access the file in the 
manner requested (and the file exists), then 
returns a small integer, known as a Relative File 
Number or RFN. The RFN is the handle used by RH 
in all future references to the file until it is 
closed, at which time the RFN becomes invalid. 

2) A buffer is allocated in the user process's 
address space. Buffers are managed by SV , but 
their allocation, freeing, and use is under the 
control of RH. A buffer is exactly the same size 
as a Datacomputer file page (and of a TENEX Dage). 
The buffer is identified by yet another small 
integer returned by SV . 

Li 


Page 15 






S' 


Semi-Annual Technical Report Datacomputer Project 

System Description 


3) If the page is being read (data already exists and 
is being referenced), an SVCALL known as PGRD is 


executed , 

This takes the RFN 

of the 

file, 

the 

section 

number, the page 

number 

within 

the 

section , 

and the buffer number 

into which the 

page 


is to be read as inputs. After the call, the page 
is available in the buffer. 

4) If the page is being created, data is first 
entered into the buffer by the Request Handler, 
then the page is written to the file by the SVCALL 
PGWR . Arguments are the same as with PGRD. 

5) If the page is being modified, the sequence is 
PGRD, modify, PGWR 

6) When the Request Handler is through with the 
buffer and the file, the buffer is released by an 
explicit SVCALL, and the file is closed. 

2.4.4 The SV Input/Output System and Monitor 

input/output and monitor facilities provided by 
are fairly simple when compared with the directory 
Input/output consists primarily of a set of connec- 
the ARPANET, with the ability to read and write 


I 


The 
Services 
system . 
tions to 


Page 16 


Semi-Annual Technical Report Datacomputer Project 

System Description 


buffers of data to/from a given connection. A special set 
of SVCALL's are provided for communication with the Datacom- 
puter operator's console. The operator is consulted before 
particularly large requests are executed for user jobs, and 
certain kinds of messages about the state of the Datacom- 
puter are routed there. 


The Services monitor provides no particular facilities 
of its own, but is responsible for the creation/destruction 
of TENEX forks which represent particular Datacomputer sub- 
jobs. As users contact the Datacomputer via the network, 
they are assigned to a particular sub-job by the master 
process known as "Job 0", which is just like any other Data- 
computer process, except it has the monitor code enabled. 


2.5 The User's Level - RH 


The "outermost" level of the Datacomputer is known as 
the Request Handler. RH is in some sense an application 
program, since it is possible for a reasonably naive user to 
interact directly with it, via a specialized data-management 
language known as "Datalanguage" . It would not be unreason- 
able to consider Datalanguage as the Datacomputer ' s order 
code . 


Page 17 




The Datacomputer maintains one or more input/output 
channels for the user. These are called "ports". All Data- 
language interactions flow over a particular port known as 
the "default port" or the "Datalanguage port". This port is 
the connection established when the user first contacts the 
Datacomputer from the ARPANET. Data may flow over the 
default port or over auxiliary ports which are created by 
Datalanguage statements as the session progresses. It is 
preferable to use auxiliary ports for data for two reasons: 









Page 18 


Datacomputer Project 


4 





Semi-Annual Technical Report 

System Description 

first, only ASCII data may pass through the default port; 
and second, even though the data being passed is ASCII, care 
must be taken to insure that it contains no characters which 
are treated specially when passed through the default port. 

Datalanguage statements fall into two categories 
commands and requests. In general, commands control the 
state of the user's Datacomputer process; open and close 
files, create nodes, modify privilege tuples, etc. Requests 
refer directly to the contents of files. A large part of 
Datalanguage is devoted to the detailed description of the 
contents of files, and the Request Handler makes extensive 
use of such descriptions in planning its actions. 




When the user first connects to the Datacomputer, 
Services initializes a new Datacomputer process, then passes 
control to the Request Handler. RH does some initialization 
of its own, then asks SV for the next line of input from the 
Datalanguage port. If the input line is a command, it is 
executed immediately. Requests are compiled, then executed. 



Page 19 




The Request Handler's compiler is invoked for most re- 
quests. (A special subset of easy- to-handle requests are 
interpreted by a special module known as "SLURP".) The com- 
piler consists of three parts. 

1) The first phase of compilation is handled by a 
routine known as the "pre-compiler". The pre- 
compiler takes the request as received from the 
user, does validity/syntax checking, and produces 
a new representation of the request known as "in- 
termediate language". Intermediate language con- 
sists of a set of functions which are an abstract 
description of the entire set of operations which 
are legal on Datacomputer data. These functions 
are essentially the low-level machine language of 
the Datacomputer. They represent elementary oper- 
ations such as "move an item from container 1 to 
container 2" with appropriate ancillary informa- 
tion such as the type and location of containers 1 
and 2. Most of the "smartness" of the Request 
Handler lies in the pre-compiler. It is complete- 
ly responsible for the syntactic and semantic in- 
terpretation of user requests (but not their exe- 
cution) . 

Page 2-20 



ki 


M | O’rrrytr ^ . » c . • - : t- - - . 



Semi-Annual Technical Report Datacomputer Project 

System Description 

2) After the pre-compiler has abstracted and simpli- 
fied the request, the intermediate language gener- 
ated and the descriptions of the real files which 
are named in the request are fed to the rest of 
the compiler. This section is responsible for 
generating the instructions for actually moving 
data from one file (or port) to another under the 
control of the request. The output of this phase 
of the compiler is a data structure which contains 
all the messy loops, skips, and such for plowing 
through and pulling the data specified in the 
format requested from the file. The descriptions 
of these operations are called "tuples". 

3) Finally, the routines which actually execute the 
request on the data are, in some sense, part of 
the compiler. Many of the tuples have distinct 
sub-routines which are responsible for their exe- 
cution, and those routines constitute both the 
run-time environment and part of the compile-time 
data base of the compiler. Because of the multi- 
tude of data-types, byte sizes, etc. allowed by 
the Datacomputer , each tuple has many "modes", 
which are identified by bits in the data struc- 
ture. For any given request, a particular set of 
modes is used, and a particular subset of the 
tuple code is executed. The last phase of the 

Page 2-21 


Li 


Semi-Annual Technical Report Datacomputer Project 

System Description 

compiler walks through the tuple list that defines 
the request, and extracts the instructions which 
perform the tuple functions as constrained by the 
active mode bits in the tuples, producing the 
final "compiled request", which is executed with 
the real data. 


i 


Page 2-22 





Semi-Annual Technical Report Datacomputer Project 

Computer Corporation of America 

Chapter 3 

Datacomputer Usage 

The dominant fact in usage of the Datacomputer during 
the latter half of 1976 was the release of Version 2, with 
its incorporation of the terabit memory system. This ful- 
filled the Datacomputer * s promise of very large, cost- 
effective on-line storage, and use of the Datacomputer re- 
flected that reality. The number of bits stored has in- 
creased dramatically; projects which had filled their avail- 
able allocations in system development went into production 
as the TBM's facilities became available. 


3 . 1 Seismic Usage 


The seismic application of the Datacomputer is dis- 
cussed in more detail in section 6 of this report; at this 
point it suffices to sketch the explosive growth in use of 
the Datacomputer as this application went on line. Begin- 
ning in October, real-time raw seismic data was fed to the 
Datacomputer through the Arpanet at rates of 7 - 12 kilo- 
baud, around the clock. By the end of the year, nearly 70 
billion bits of seismic data -- raw readings, event sum- 


Page 23 



Semi-Annual Technical Report Datacomputer Project 

Datacomputer Usage 

maries, instrument status reports, and various historical 
files -- had been stored in the Datacomputer. If stored on 
conventional disk storage, this volume of data would have 
required more than 85 spindles of 100-megabyte drives! 

As the seismic database grew, researchers began re- 
trievals against it, initially to develoD and test proce- 
dures for use in ongoing seismic studies. A special program 
was developed at CCA to assist analysis of seismic data by 
providing quick graphs of waveforms from the raw data files; 
and the RDC program developed earlier at CCA was used exten- 
sively in these inquiries. The Applied Seismology Group at 
Lincoln Laboratories also developed several systems for ac- 
cessing seismic data through its PDP-11 UNIX system, and 
were active users of seismic data during this period. 


8.2 DFTP 


Growth of user activity in other areas of Datacomputer 
service was somewhat less dramatic, but equally real. The 
most widespread usage continued to be the DFTP system, which 
provides a uniform file archival service for various PDP-10 
systems throughout the Arpanet. Several significant devel- 
opments occurred with DFTP during this period: The new file 


Page 2H 


pr- /•‘T-r:' 


r 


Semi-Annual Technical Report Datacomputer Project 

Datacomputer Usage 


organization discussed in the previous semi-annual report 
was implemented, and most user's data was transferred to 
that format. Installation of the TBM allowed usage to grow 
beyond the artificial bounds imposed by disk space available 
in earlier versions of the Datacomputer. A new implementa- 
tion effort extended the class of operating systems on which 
DFTP is available, and thereby its domain of users. 

The new format of DFTP storage collects many files for 
a particular user into a single Datacomputer file. The 
space savings in the scheme are impressive, but relatively 
unimportant (averaging around 60^ for most users). More 
significant is the reduction in Datacomputer directory over- 
head and in TBM accessing which goes along with this compac- 
ter storage. The new format also provide users with addi- 
tional directory information and integrity features, and in- 
corporates several user features which make it considerably 
more convenient & effective. By year-end, all but three ex- 
isting sites had had their user's data transferred, and the 
new version of the program made available to users. The re- 
maining sites were delayed by coordination difficulties, but 
all were transferred before this report was written. 


panded 

to f i 

11 their 

a 1- 

reased 

space 

when the 

TBM 

signi 

f icant 

increase 

in 

DFTP; 

as a 

result , 

file 


Page 25 




Semi-Annual Technical Report Datacomputer Project 

Datacomputer Usage 

storage rose from about 850 to about 1300 megabits in the 
second half of the year, and was increasing steadily. 

Concurrently with the release of the new DFTP to old 
users, versions of the program were made available to users 
of the ITS, SAIL, and TOPS-20 operating systems. This pro- 
vided complete coverage of the PDP-10 family of systems on 
the network; efforts were underway to spread DFTP to 
MULTICS and UNIX systems during the period. 

In addition to its primary function as an effective 
file storage resource for members of the Arpanet community, 
DFTP served as an example system for groups investigating 
means of accessing the Datacomputer; these included re- 
searchers in Facsimile message processing at the University 
of London, researchers from several ERDA laboratories, and 
the UCLA-Security research group. 


3.3 IMP statistics 






The Arpanet Network Control Center has been an estab- 
lished user of the Datacomputer, storing statistics on per- 
formance of the IMPs which implement the network. This ap- 
plication continued unabated, and artificial limits on the 
amount of data stored were removed with installation of the 


; 


i 


Page 26 


Semi-Annual Technical Report Datacomputer Project 

Datacomputer Usage 

TBM. Usage has grown from 200 to 600 megabits during the 
period of this report, with steady retrieval activity 
against the data. The retrieval programs written at BBN in- 
corporate a package of Datacomputer interface subroutines 
(DCSUBR) described in the predecessor to this report. 

3.4 SURVEY 


Another well-established use of the Datacomputer is the 
SURVEY application, carried on in conjunction with MIT's 
Laboratory for Computer Science. Current survey data on the 
status of hosts on the Arpanet continued to be stored at the 
rate of about 10,000 probes per day. Efforts also began to 
restore SURVEY data which had been migrated off-line due to 
space considerations in old versions of the Datacomputer. 
By year-end, all four quarters' data for 1976 were restored, 
raising usage from 350 megabits to about 600. 

The SURVEY database was used during this period by re- 
searchers in the Very Large Data Base project at MIT. They 
used the Datacomputer ' s processing of requests against 2 
quarters' worth of SURVEY data to test their work in estima- 
ting the cost of query processing in very large databases. 
This usage involved approximately 300 more megabits of 
storage . 


Datacomputer Project 



I 


t • 

£ ’ 

i 

? ■! 




Semi-Annual Technical Report 

Datacomputer Usage 


7.5 ACCAT 



The ARPA Command and Control Advanced Testbed, being 
undertaken at the Navy Electronics Center Laboratories in 
San Diego, became a major focus of interest in the Datacom- 
puter in this period. Initial investigations begun at the 
Stanford Research Institute in 1975 were continued, with 
further demonstrations of the system which interfaced a 
natural language query processor to the Datacomputer. The 
conversion of the database which supports this system to a 
relational format was begun, and steps were also undertaken 
to load a much larger command and control database into a 
version of the Datacomputer to be brought up at NELC. Data- 
computer personnel provided consultation and programming 
support for this conversion and loading task; previously 
implemented user software (like the DCSUBR package) proved 
valuable in this effort. Major research in distributed data 
management will be carried out under a separate contract, 
using the Datacomputer as the basic DBMS. 


Page 28 



Datacomputer Project 



Semi-Annual Technical Report 

Datacomputer Usage 


1.6 ER DA 


Several of ERDA's national laboratories began investi- 
gations of the Datacomputer during the reporting period; 
most active were groups at the Argonne and Lawrence Berkeley 
Laboratories. The major project in this period was begin- 
ning installation of a climatological database by personnel 
at Argonne. This database contains 16 files, each with a 
year's worth of hourly readings for some U.S. city; the 
data are used by a number of sets of programs which model 
energy usage in buildings and communities (CAL-ERDA, ATMES, 
and ACUC systems). 

Interfacing with Argonne personnel provided a test of 
the scope of Datacomputer applicability beyond its previous 
extent: the database was being transferred from a 370/195 
at Argonne, through a Varian network interface, into the 
Datacomputer, and later retrieved for processing at 
Berkeley's CDC 6600. Despite problems with the network in- 
terface at Argonne, this sequence of transfers was success- 
fully accomplished with minimal user programming effort and 
no modifications to the Datacomputer. It is interesting 
that the standard File Transfer Protocol (FTP) programs im- 
plemented at Argonne and Berkeley proved sufficient for ac- 
cessing the Datacomputer, although they might be replaced by 


Page 29 


7 


Semi-Annual Technical Report Datacomputer Project 

Datacomputer Usage 


more specialized interface programs at some later date. The 
concept of the Datacomputer as a resource for data sharing 
thus received additional validation. 


.7 Message Arc h iving 


Under a separate, short term contract, CCA began in- 
vestigations into the use of the Datacomputer for archiving 
network messages. Preliminary investigations dealt with in- 
terfacing to the various message handling programs now 
available on the Arpanet, and designing possible Datacom- 
puter implementations which would be consistent with such 
systems. Experience gained in implementation and operation 
of DFTP proved valuable in this effort. 


3.8 NSW 


The National Software Works project began work on using 
the Datacomputer as the archival system for its historical 
databases as storage became available for that purpose. 
Initial design relied on existing Datacomputer user support 
facilities, such as the DCSUBR package mentioned above. At 




Page 30 


Semi-Annual Technical Report Datacomputer Project 

Datacomputer Usage 


the end of the reporting period, consultations were underway 
between Datacomputer and NSW personnel on specifics of file 
description and accessing characteristics. 


A ccountin g 


As a final task in the implementation of the Datacom- 
puter, a system was implemented for accounting for usage. 
There were several interesting problems involved here: It 
was necessary to choose a set of parameters which at once 
constituted an adequate measure of resource utilization and 
could be made accessible to the accounting system with rea- 
sonable effort. Given this information, the actual process- 
ing of that usage data into meaningful accounts was a rela- 
tively straightforward reporting task. There were certain 
kinds of reporting, however, which were unique to the Data- 
computer's status as a network utility. These involve 
access to shared databases. Questions arise as to who is 
ultimately responsible for the storage and accessing of 
shared data; if charging for these functions is separated, 
the "owner" of a database may lose information on how it is 
being used, and by whom. 


Page 31 





Semi-Annual Technical Report Datacomputer Project 

Datacomputer Usage 


Collection of appropriate usage statistics proved 
simple, as the Datacomputer already supplied more than 
enough statistics in its normal processing; modifications 
to the Datacomputer involved simply formatting the appropri- 
ate counters in a separate file, and adding routines to 
account for storage and directory space used. This informa- 
tion is dumped to a file external to the Datacomputer; from 
there several straightforward programs manipulate it to 
produce usage reports which can serve as the basis for 
billing when needed. Shared databases have their storage 
and accessing charges separated; however, advisory access 
reports are supplied to the official "owners" of data de- 
tailing times, modes and user identities for access to that 
data . 

Preliminary reports were produced at year-end, and 
regular reporting is expected to begin monthly in 1977. 


Page 32 



Semi-Annual Technical Report Datacomputer Project 

Computer Corporation of America 


Chapter 4 

Software Development 


Des 

cription of 

work on 

Datacomputer so 

f twa 

re will 

follow 

the 

division 

between 

the Services (SV) 

and 

Request 

Handler 

(RH) 

sections 

of the project introduced 

in 

Chapter 


2 . 


4 . 1 Services 


A. i 


£. 

fir/ 

L.W- 


u 


The Services work during this reporting period was con- 
centrated in five areas. Staging routines, terabit memory 
(TBM) support, accounting, directory security and mainten- 
ance/testing . 


4.1.1 Staging 


The design and implementation of the staging area re- 
placement algorithms was accomplished during the first part 
of the reporting period. As the staging area or its direc- 
tory become full these algorithms are used to determine 
which files should be removed to free additional space. 


Page 33 




Datacomputer Project 



P 

! 


( 



<) 


Wfl 

t , 


0 


u 




Semi-Annual Technical Report 

Software Development 

File selection is based on the amount of user generated ac- 
tivity and the number of modified pages that must be copied 
back to TBM before the file can be deleted. Three algo- 
rithms were implemented for the Version 2 release. The pri- 
mary algorithm involved the Datacomputer monitor (JobO) per- 
iodically running a background process to monitor the 
staging area and its associated directory table. When space 
used exceeded a certain threshold the background process 
would begin to migrate updated files back to TBM freeing up 
additional space. In the event the background process was 
unable to keep up with user demands for staging space, two 
overflow algorithms were implemented to enable user subjobs 
to free their own space. 

From observing the operation of the Version 2 Datacom- 
puter, it was discovered that JobO was not always able to 
run the staging area monitor process often enough to prevent 
staging area overflows. This problem was solved in the 
Version 3 Datacomputer by moving the staging area monitor to 
a fork inferior to JobO. The staging routines were also 
enhanced to support multiple staging disks. This gave the 
Version 3 Datacomputer a staging area capacity of up to 
120,000 pages. Several new operator commands were imple- 
mented to facilitate operator control of the staging area. 
These included staging area status and variouscommands to 
enable staging disks to be dynamically added or deleted from 

Page 34 












r 


7 




Semi-Annual Technical Report Datacomputer Project 

Software Development 

the Datacomputer. For users with applications requiring 
fast access to files, a capability was added to allow files 
to be "frozen" in the staging area, A "frozen" file can 
remain staged indefinitely, thus avoiding the overhead that 
might be caused by re-staging. 

H.1.2 Terabit Memory ( TBM ) Support 

One utility was modified and several new utilities 
written to support TBM operation. The Directory Cross 
Checker (XD) is a utility that verifies the internal consis- 
tency of the Datacomputer Directory. XD was modified to 
support off-line TBM Volumes and to build the enormous bit 
patterns required to verify that no two Datacomputer files 
are inadvertently mapped to the same TBM block. 

Because the blocks on TBM tapes eventually wear out, 
two utilities were written to create backup copies of Data- 
computer files. The first utility is automatically run by 
the Datacomputer every night. It walks the Datacomputer ' s 
directory tree and creates backup copies of recently crea- 
ted/updated files. The storage maps for all copies of a 
file are chained together on the file's directory page. In 
the event that a TBM tape is accidentally damaged, the 
storage map pointers for all effected files can be simply 



Semi-Annual Technical Report Datacomputer Project 

Software Development 

changed to point to their previous file copies, resulting in 
little or no loss of data. 

Since space on TBM tapes is never physically deleted, a 
utility was written to copy the latest version of all unde- 
leted files from one tape to another. The old tape can then 
be archived or reused. 

A new version of the Datacomputer software will fre- 
quently contain directory modifications that make it incom- 
patible with older versions. The normal procedure for up- 
grading a database for a new software release was to dump 
all data and directory information and reload everything 
with the new version. With the large amounts of data stored 
on TBM this is no longer possible. A utility was implement- 
ed to dump and reload only the directory information. 
Lastly, TBM tapes are now mountable/demountable per operator 
command. This permits very large or seldom used files, or 
full TBM tapes, to be stored off-line and mounted only when 
needed by specific users. 

4_i_L_3 Accounting 


Page 36 




Semi-Annual Technical Report Datacomputer Project 

Software Development 

The accounting package provides a facility for monitor- 
ing costs incurred by individual users accessing the Data- 
computer. Services support for this effort included gener- 
ating a history file of dynamic charges (i.e., pages 
read/written, cpu time) incurred during user sessions and 
implementing a utility routine. This utility is run at the 
end of a billing period. It walks the Datacomputer ' s direc- 
tory tree and records static charges (i.e., file space, dir- 
ectory space) associated with each node in the directory. A 
facility to allow the database administrator to mark certain 
nodes as billable and to insure that all users are loged in 
beneath a billable node was also implemented. 

4.1.4 Directory Securit y 

A general mechanism to allow deferred privileges was imple- 
mented. A deferred privilege is one that is granted or 
denied at one level in the directory tree but does not take 
effect until the directory tree but does not take effect 
until the next deeper level. The immediate result of this 
was to prevent users from modifying their own space alloca- 
tion limits. 


Page 37 


r 



Semi-Annual Technical Report Datacomputer Project 

Software Development 

4.1.5 Maintenance / Testing 

During this reporting period at least two and occasion- 
ally three Datacomputers were being actively supported. The 
Version 1 Datacomputer was operational throughout the re- 
porting period. It is currently being phased out as the 
last of its database is transferred to the Version 3 Data- 
computer. A preliminary version of the TBM Datacomputer was 
available for limited network access prior to the Version 2 
release. The Version 2 Datacomputer became operational in 
October 1976. 

A great deal of time was spent testing Services code 
for both the Version 2 and Version 3 releases. In addition, 
the first month of the reporting period was spent testing 
and debugging the Datacomputer /TBM interface. 




During the second half of the year, the majority of the 
work involving the request handler section of the Datacom- 
puter system fell into two areas: software development and 
system maintenance. 


Page 38 

J 



! 


Datacomputer Version 2 was released mid-way through the 
reporting period. Although some software development took 
place prior to the release of Version 2, the majority of the 
(Version 2) effort was directed toward testing and system 
integration . 


4 . 2 . 1 . 1 Space A1 1 ocation 



Users have been given the option of exercising more de- 
tailed control over the allocation of space for their files. 
The system continues to supply default allocation parameters 
that will suffice for most applications. However, with the 
advent of the mass memory, the variation in possible file 
sizes has passed the point where default allocation parame- 
ters will serve all users reasonably. 

Allocation of space in the Datacomputer is done in 
terms of physical blocks on the mass memory, each of which 
holds 1,032,192 bits. This block is the basic allocation 
unit, and its size is referred to as an "M", to reflect its 
approximation to a megabit. It is also used when the user 
specifies an allocation limit, as the ",M=" option of the 


Page 39 


Semi-Annual Technical Report Datacomputer Project 

Software Development 


CREATE and MODIFY commands. Output from a LIST command 
(^DIRECTORY or % INFORMATION options) describes space allo- 
cated, used, or charged, and is now given in Ms correct to 2 
decimal places. 


Several options have been added to give users more con- 
trol over various aspects of the space allocation for their 
files. All of these new options have default values which 
are compatible with the internal values used by the Version 
1 Datacomputer; thus applications that were designed for the 
Version 1 system may use default values for the allocation 
options in the Version 2 (and 3) Datacomputer. Most new 
Datacomputer applications may use default values for these 
parameters . 


If a file contains variable length containers which may 
be updated, it must have the CHAPTER option specified on it. 
This causes the base to be broken down into smaller pieces, 
called chapters, and thereby eases the problem of expanding 
or shrinking records in the middle of a file. It also 
causes space to be left unused in each chapter to allow for 
possible expansion. 


Two parameters are provided to allow the user to 
specify the manner in which the file is chaptered: CF 
(Chapter Fill level) and CR (Chapter Record count). Both 
options specify an integer value. CR is the number of 


Page 40 


Semi-Annual Technical Report Datacomouter Proiect 

Software Development 

records to be included in each chapter and CF is the per- 
centage of a chapter which is to be filled with data when it 
is first written. 

The default value for CF is 8 0 ; for CR, it is the 
number of records that will fit in a 1 M block, eiven the 
average record size calculated from the file description, 
and the (default or explicit) value of CF. 

The ^DESCRIPTION option of the LIST command will show 
these values for a chaptered file as specified or defaulted. 

Three new parameters were implemented to control the 
space allocated for inversion: IA (uniaue number of 
attribute value pairs), ID (inversion density) and II 
(inversion increment). The Datacomputer Version 2 User 
Guide presents a comprehensive explanation on how and when 
to use these options. The combination of default values 
for these parameters results in an inversion allocation of 1 
M and should suffice for most applications. 

The % DESCRI PTION option of the LIST command will list 
the values of IA and ID only if they have been specified by 
the user; it will list II whether it was explicitly set by 
the user or left as a default. 


✓ 


Semi-Annual Technical Report Datacomputer Proiect 

Software Development 


4 .2 . 1 .2 DIRECT Mode 


A new mode option, DIRECT, has been added to the OPEN 
and MODE commands. There are now two inversion-related mode 
options for these commands: DIRECT and DEFER. Mode options 
pertain to the internal manner in which the addition of 
records to inverted files is handled by the Datacomputer. 
DEFER is appropriate in almost all cases and is now the 
default. DIRECT causes the system to update the inversion 
data structure for every inverted container’s value as it is 
written to the base and should be SDecified only when the 
number of records being added to the file is exceedingly 
small. DEFER causes the system to batch a group of inverted 
values and update the inversion structure ' occasional lv ' 
during the request. In most cases, this is much more effi- 
cient . 

4.2.2 Datacomputer V ersion 3 


In the second half of the reporting period, develonment 
efforts were concentrated on the implementation of new 
features for the Version 3 Datacomputer. 


Page 42 



.... v „, v . . , rn. 


Semi-Annual Technical Report Datacomputer Project 

Software Development 

4 . 2 . 2 . 1 File Group s 

Under support from a related contract from the ARPA 
Nuclear Monitoring Research Office, the File Group feature 
was implemented and made available to all Datacomouter users 
with the release of Version 3. 

Modifications were made to all levels of the compiler 
and the command handler to support this feature. 






The CREATE command was expanded to support the creation 
of a group node. A group node itself is a special kind of 
file which defines the members of the group. Two options 
were added to the description mechanism for files. The 
logical constraint option ( , LC= <boolean> ) allows the system 
to determine which subfiles of a sroup should be accessed 
during the execution of a request. The automatic include 
option ( ,GROUP=<group-pathname> ) causes the file to be in- 
cluded as a subfile of the named group at creation time. 

The user must take the initiative in defining and main- 
taining a group's domain using three new commands. INCLUDE 
causes the system to include a file as a subfile in a group. 
At INCLUDE time, the user may also specify a logical con- 
straint. EXCLUDE causes a subfile to be marked as deleted 
from the domain of a group. COMPRESS causes the system to 




Page 43 



garbage collect the group's domain, removing all subfiles 
marked as deleted. 


In support of these three new commands, a genera] mech- 
- I 

anism for internal requests was implemented. An internal 
request is a reauest initiated by the Datacomouter itself 
rather than a user. INCLUDE results in an internal request 
to append a record to the group's list of members describing 
the newly included subfile. EXCLUDE results in an internal 
request to update this list, marking the subfile as deleted. 
COMPRESS results in the list being rewritten, skiDDing all 
records marked as deleted. 


1 

J 

■ I 

i- 

I ' 

i ■ 



U 


A new option, ^DOMAIN, was added to the LIST command. 
This option causes the system to read the list of the files 
in the group (its "domain"), and output the names and 
logical constraints of all non-excluded subfiles. 


The pre-comoiler, compiler and code generator were mod- 
ified to handle requests run on groups. Essentially, a loop 
is generated to read the group's domain. For each non- 
excluded subfile in the group, an analysis is performed on 
the subfile's logical constraint and the reauest Qualifica- 
tion to determine if the subfile must be accessed. If it is 
determined that the subfile should not be accessed, the 
'inner-loop' which would process the subfile is skipped and 
the next subfile is considered. 




Page 




Semi-Annual Technical Report Datacomouter Project 

Software Development 


4 . 2 . 2 . 2 F unctions 


Four special arithmetic functions were added to Date- 
language and made available to all Datacomputer users with 
the release of Version 3. Each of these functions may be 
used semantically as an exDression or on the left hand side 
of a relation. 

GCDIST returns the great circle distance from position 
A to position B accurate to the nearest nautical mile. 
BEARING returns the great circle bearing from position A to 
position B accurate to the nearest degree. RLDIST returns 
the rhumb line distance from position A to position B accur- 
ate to the nearest nautical mile. COURSE returns the rhumb 
line course from position A to position B accurate to the 
nearest nautical mile. 

4 . 2 . 2 . 2 Priority 


A comprehensive priority mechanism is planned for a 
future version of the Datacomputer. The Datalanguage to 
support this mechanism was implemented during the reporting 
period . 


Page 45 



unless overridden. 

A new option was added to the CREATE (node) and MODIFY 
commands ( , P = <integer> ) . Its function is to set the prior- 
ity limit at a node, similar to the allocation limit option. 

The PRIORITY command was implemented. Its function is 
to override the system default priority level for a session, 
up to a user's priority limit. 

A new privilege option was added to the CREATEP command 
( ,Q = <integer>) . Its function is to override the system 
default Driority level when matched, without the need for 

I 

issuing a PRIORITY command. 

4.2.1 Maintenance and Testing 



Due to the fact that two versions of the Datacomputer 
were released during this reporting Deriod, a large amount 
of time was spent on testing. Our catalogue of test decks 
was expanded to insure comprehensive testing of old and new 
features. All of the test decks were run twice, once in 
August prior to the release of Version 2 and again in Decem- 
ber prior to the release of Version 3. 


Page 46 


FT r 


a * 
> : 


n 


i • 


j| 

•* 


il . , 


Semi-Annual Technical Report Datacomputer Project 

Computer Corporation of America 

Chapter 5 

Hardware / Site / Operations 

During the first half of 1976, an Ampex Tera-Bit Memory 
system (TBM) was installed at CCA. After extensive tests 
during the second half of 1976, the TBM was accepted. 


6.1 Site Improvements 


Major site improvements were made to accommodate the 
TBM near the end of 1975. An additional air conditioning- 
unit became operational in January 1976 and in February the 
TBM hardware was delivered. 


5.2 The TBM 


The TBM system at CCA is a 200 billion bit configura- 
tion with 50 billion bits per drive and four drives. The 
follow ins figure shows the general structure. The TBM’s 
transfer rate is 6 million bits per second, similar to an 
IBM 3330 type disk. A high speed seek from one end of a 
tape to the other takes approximately 45 seconds. 



Page 47 



mi 

















Semi-Annual Technical Report DatacomDuter Project 
Hardware / Site / Operations 

By March 1976 data had been transfered to and from the 
Ampex TBM but Ampex continued to work on ironing out diffi- 
culties in the TBM at CCA through the end of June. Agree- 
ment on a mutually acceptable detailed test plan was reached 
in early June and formal acceptance testing started in July. 
The hardware performed well during this test but some diffi- 
culties were encountered with the internal TBM software par- 
ticularly in the CIU (Channel Interface Unit, a PDP-11/05) 
and the SCP (System Control Processor, a PDP-11/35). A 
further software test in August demonstrated that these 
problems had been corrected and the system was accented. 


5.3 Operations 


The Version 1 Datacomputer as described in its User 
Manual was operational usine three 3330 type disk spindles 
for storage and provided service over the Arpanet at the 
start of this reporting period. When the Version 2 Datacom- 
puter was made publicly available, using the TBM, some in- 
formation was transfered into it from the operational 
Version 1 Datacomputer and the Version 1 Datacomputer was 
squeezed down to two 3330 disk spindles to provide staerint? 


room for Version 2. 


Semi-Annual Technical Report Datacomputer Project 

Hardware / Site / Operations 

The Version 2 Datacomputer was provided experimentally 
starting in June 1976 using a 2314 type disk as staging . 
Starting in October, 1976, Version 2 using 3330 type disks 
for staging and TBM tape for storage became the standard 
Datacomputer . 

At the end of 1976, the Version 3 DatacomDuter , with 
the enhancements described above, became available over the 


network . 



-■f uy w.m 1 rr* 




^ "-V' - 


— 


- r*.w 


Semi-Annual Technical Report Datacomputer Project 

Computer Corporation of America 

Chapter 6 

Seismic Data Base Support 

As mentioned in the Introduction, some work on the 
Datacomputer is funded under a separate contract from the 
Nuclear Monitoring Research Office (NMRO) of ARPA. A short 
discussion of this work is included here because of its in- 
timate relation to the work of the primary Datacomputer de- 
velopment contract. 

6 . 1 Overview 







This work is directed toward establishing an on-line, 
real-time data base of seismic information from a world-wide 
network of monitoring sites and toward making this data 
available to computers for seismic analysis and other 
purposes . 

Since the system will work in real time and some of the 
data will be retrieved by computers on the Arpanet, the 
Arpanet was chosen as the most appropriate communications 
medium available for the entire system. Seismic data is 
collected at the Seismic Data Analysis Center from sensors 
at various locations around the world. The data is then 

Page 51 



‘1 



Li 


✓ 


Semi-Annual Technical Report Datacomputer Project 

Seismic Data Base Support 

transmitted from the SDAC Control and Communications Pro- 
cessor to CCA over the network. At CCA a small highly reli- 
able computer known as the Seismic Input Processor, or SIP, 
absorbs the incoming data and stores it in a disk buffer. 
Periodically, the SIP connects to the Datacomputer (again 
via the network) and bursts the collected data into the 
Datacomputer at a very high rate. A limited amount of data 
from sources that do not report in real time will also be 
sent to the Datacomputer directly. 


6.2 The SIP and N e twork Considerations 


The SIP became operational during the first half of 
1976. In the later part of 1976, NORSAR and LASA data were 
being successfully processed through the SIP. 

Observation of network behavior while real time seismic 
data is being sent from SDAC to the SIP and older seismic 
data is being burst from the SIP to the Datacomputer leads 
to the conclusion that the current network capacity at CCA 
is marginal for this application. The difficulty is due to 
limited processor power and very limited reassembly buffers 
in the 516 IMP. Some relief was gained last year when the 
316 TIP at CCA was replaced with a 516 IMP to increase the 


Page 52 







' -T*« ^ 


Semi-Annual Technical Report Datacomputer Project 

Seismic Data Base Support 

bandwidth of CCA's network node. Temporary relief was also 
provided by moving the Lincoln Laboratories VDH that had 
terminated at CCA to another IMP. However considering the 
much higher loads planned in the future, the most appropri- 
ate long run solution to this problem appears to be the in- 
stallation of a properly configured PLURIBUS IMP at CCA. 
Action has been taken to accomplish this and installation is 
expected to take place in 1977. 


Page 53 


Semi-Annual Technical Report Datacomputer Project 
Computer Corporation of America 

Chapter 7 
Summary 

Over the course of this contract the Datacomputer has 
grown from a very basic system into a sophisticated large 
scale data storage facility with substantial data management 
capabilities. At the beginning of this contract the Data- 
computer supported only a limited amount of storage and a 
restricted set of user functions. Data description and man- 
ipulation facilities were limited. Today the system pro- 
vides facilities for data sharing among dissimilar machines 
on the Arpanet, rapid access to large on-line files, storage 
economy through shared use of tertiary store and improved 
access control. 

Internally, all Datacomputer modules fall into one of 
two principle subsystems, which are called "Services" and 
"Request Handler". The Request Handler is the "outer" layer 
of the Datacomputer. It provides the interface to the Data- 
computer user, parses and compiles Datalanguage , formats and 
interprets user data, and is in general control of overall 
Datacomputer functions. Services modules provide the lower- 
level functions internal to the Datacomputer itself, such as 
device and network I/O control, core and disk buffer manage- 
ment, file directory services, and operating system inter- 
face . 


Page 54 




r 


Semi-Annual Technical Report 

Summary 


Datacomputer Project 


>■ 



7.1 Service s 

When this contract began, the Services subsystem of- 
fered rudimentary capabilities. Many of the then-existing 
modules have been extensively reworked, and several major 
modules have been added. Major changes or additions have 
been made in the areas of tertiary memory I/O functions, 
testing and support utilities, file security and accounting. 
Each of these facilities is described in more detail below. 


7.1.1 Tertiary Memory Support 



The most important addition to Services is the support 
system for the tertiary memory. This has required implemen- 
tation of the data-staging mechanism known as SDAX, a gener- 
alized data-page moving mechanism ("slosher") capable of 
moving large amounts of data between TBM's, 3330-type disks, 
RP02-type disks and core. SDAX reduces I/O wait times for 
users who have data stored on tertiary memory by staging 
(portions of) file data onto dedicated 3330-type disks. The 
user's file data operations are then performed on the staged 

Page 55 

. ... ... ... . .. 



- ' . 

Jr 


_ w 1 


Semi-Annual Technical Report 

Summary 


Datacomputer Project 


data and at an appropriate time modified data is written 
back to tertiary memory or simply deleted from disk. These 
operations are completely transparent to the Datacomputer 
user. Implementation of SDAX and slosher routines required 
extensive modifications throughout many other Services 
modules, including the directory tree, error and core- 
management handlers. 


7.1.2 Utility Support Programs 


In addition and as an enhancement to tertiary memory 
device support, a number of utility functions were added to 
the Datacomputer. The most important of these are: 


The directory cross-checker, which exhaustively 
examines the internal consistency of the file dir- 
ectory ; 



Backup and reload utilities for Datacomputer 


files, 

which 

6,000 


Page 56 


T 


Semi-Annual Technical Report 

Summa ry 


Datacomputer Project 


Extensive testing facilities, including a "Q- 
tester" which simulates the activity of several 
concurrent network Datacomputer users in creating 
and deleting nodes and files, and reading and 
writing patterned data to these files; 


Greatly improved error-handling facilities. The 
Datacomputer is able to recover and continue from a 
very large number of potential errors, especially 
including device and network I/O problems. 


.1.3 Security and Accountin g 


Two other major areas of work have been the addition of 
a sophisticated security subsystem and inclusion of an ac- 
counting package. Datacomputer security is provided by a 
privilege block and password system which operates at every 
level of the directory tree, is user-settable for all nodes 
under that user's control, and which offers per-user acces- 
sability to any given file. The accounting package offers 
a facility for prorating costs of Datacomputer operation, 
data storage and network interface to individual sites ac- 


cessing the Datacomputer. 


Page 57 


Semi-Annual Technical Report 

Summary 


Datacomputer Project 


7.2 R eques t Handler 


Key results of the Request Handler development effort 
have fallen into the following categories: data descrip- 
tion, data manipulation, and efficiency enhancing mechan- 
isms. The following subsections will consider each of these 
issues in turn. 

7.2.1 Data Descript i on 

Data is stored by the Datacomputer in FILES whose 
contents are described in a directory maintained by the 
system. Data is transmitted to or from the Datacomputer 
through PORTs which are also described in the system direc- 
tory. The Datacomputer ' s role as a central point for data 
in a heterogeneous network gives it the rather unusual re- 
quirement of being able to deal with a large variety of data 
types and machine representations of data. The data des- 
cription facilities were greatly expanded to handle strings 
whose character set is EBCDIC or BCD as well as ASCII; one's 
complement, two's complement and unsigned binary integers; 
and non-binary integers (signed as well as unsigned octal, 
decimal and hexadecimal). 


Page 58 



Semi-Annual Technical Report Datacomputer Project 

Summary 

A full set of description options was implemented. The 
byte size option specifies the number of bits in each byte 
of the container. The fill character option specifies which 
value the system should use to fill the container if no data 
is assigned to it or to pad data which does not fill the 
container to its minimum size. A terminator tells the Data- 
computer where to find the end of data in a variable-length 
container. There are three terminator options: count, de- 

limiter and punctuation. Virtual data is data that is not 
stored within the file, but is system maintained and access- 
ible by the user. No space is allocated for a container 
having a virtual option specified; the system supplies a 
value which may be retrieved and used like the value of a 
'normal' container in assignments and expressions. The 
Datacomputer supports two virtual options, virtual expres- 
sions and virtual indices. A virtual expression may contain 
any arithmetic or string operators on constants or other 
containers within the same file. When referenced the system 
calculates the value of the expression. A virtual index 
supplies the position of the closest list (i.e., the record 
number at the outermost level) and may be used in any ex- 
pression . 


Page 59 






Semi-Annual Technical Report Datacomputer Project 

Summa ry 


7.2.2 Data Manipulation 


i 

I 


f 


I 

4 

] 

I 

■ 


M 

; 

i 

: 


I 





From the user's point of view the Datacomputer is a 
remotely-located utility. It would be impractical to use 
such a utility if, whenever the user wanted to access or 
change any portion of his file, the entire file had to be 
transmitted to him. Accordingly, Datalanguage has been 
expanded to be self contained. The user sends a 'request', 
which causes the proper functions to be executed at the 
Datacomputer without requiring entire files to be shipped 
back and forth. Requests are composed of one or more state- 
ments which transfer data (assignment statement); group 
statements so that two or more statements are treated as a 
single statement (BEGIN-END block); declare local variables 
(DECLARE statement); selectively execute statements (the IF- 
THEN-ELSE statement and the UNTIL loop); selectively 
receive, transfer, and transmit data (FOR and APPEND loops); 
update containers (UPDATE loop); delete records (REMOVE 
statement); and print messages (COMMENT, ERROR, ALERT, 
ABORT, QUIT statements). 

The first implementation of the UPDATE loop restricted 
the containers which could be changed to fixed-length ones. 
Variable-length containers are typically the best choice in 
terms of storage efficiency and ease of data handling where 


Page 60 



J 


I 


ij. Iiwppifi mmm 

_ 


’ _ 




Semi-Annual Technical Report Datacomouter Project 

Summary 

strings of characters (such as names and addresses) are in- 
volved. With these considerations in mind, the CHAPTERed 
file me hanism was implemented. Chaptering is a technique 
which divides a file into parts, called chapters, to facili- 
tate the insertion and deletion of data. Each chapter can 
be independently expanded or contracted to accommodate 
shrinking or growing records. 




In order to provide a full set of computational capa- 
bilities, arbitrarily complex arithmetic, Boolean and string 
expressions are handled by the Datacomputer . In addition to 
the representation of different data types a full set of 
conversions from one type to another must be available so 
that assignment, arithmetic and comparison operations across 
data types are possible. 

7.2.7 Ef ficiency 


1 


4 



Although often not as visible to the user as Data- 
language, efficiency considerations during the manipulation 
of data are an important aspect of datamanagement systems, 
especially when dealing with large volumes of data. 


Page 61 




\ 


I 



I 



Li 


Semi-Annual Technical Report Datacomputer Project 

Summary 

For very large files, it is desirable to break up the 
data into physically smaller units so that each subfile or 
component of the group can be individually accessed, check- 
pointed, dumped, validated, etc. File groups were imple- 
mented to give the user control over how to break up the 
data into physically smaller more manageable units, those 
that are online or offline, etc. The user may also specify 
a logical constraint on each member of a group. 'Logical 
constraint' is a Boolean which limits the records in one 
subfile to having specific values in specified container(s) 
of the file. The system responds to each reference to a 
group's name by taking the appropriate action to handle a 
set of subfiles. The logical constraint is a very important 
efficiency consideration. When looping on a group without a 
qualification (WITH <Boolean>) each subfile would have to be 
opened and operated on. With an appropriate Qualification 
(one based on the logical constraint), the system need only 
reference those subfiles selected. An analysis of the 

subfile's logical constraint and the request qualification 
is performed. If the analysis determines that no record of 
the subfile would be selected by the request qualification, 
the subfile is skipped. Otherwise the subfile is accepted 
for processing. 





Page 62 


Datacomputer Project 


Semi-Annual Technical Report 

Summary 

An inversion is a secondary data structure that the 
Datacomputer can use to improve its efficiency in retrieving 
data by content from a file. Specifically, an entry in the 
inversion is constructed for every container with the 
inversion option. For each data value which occurs for the 
container, the inversion contains pointers to all the outer- 
most list members for which that container has that value. 
When an inverted file is created, storage space is allocated 
for the secondary data structure. Although fixed-length 
inverted strings were previously supported by the Datacom- 
puter, significant improvements have been made. Indirect 
inversion was implemented, permitting the retrieval of 
outermost list members based on the values of inverted con- 
tainers in any inner list. All elementary containers, in- 
cluding variable-length strings, may be inverted. An 
inversion is not only automatically constructed by the Data- 
computer when the file is loaded, it is also automatically 
maintained when data is appended or updated. 

Complex Boolean expressions, those involving several 
comparisons, fall into three classes: those with all com- 
parisons evaluate from the inversion, those containing no 
comparisons evaluable from the inversion, and those which 
mix the two kinds of comparisons. The first two classes 
pose no problems; the Datacomputer will use the inversion to 
evaluate expressions in the first category, and not for ex- 


Semi-Annual Technical Report Datacomputer Project 

Summary 

pressions in the second category. For those Booleans in the 
third category, the Datacomputer decomposes the expression 
into two Booleans. The first, based on inverted compari- 
sons, will be used to retrieve the data. The second, know 
as 'direct search' and based on the non-inverted compari- 
sons, will be applied' to only that data selected by the 
inverted Boolean. 

The container address table (CAT) is used for fast re- 
trieval of variable-length list members. It is automatic- 
ally created for variable-length list members which contain 
at least one inverted container and for chaptered files, but 
may be specified as an option on the description of a file. 
The CAT provides quick access to list elements and is pri- 
marily a tool for increasing efficiency at run-time. For 
example, to obtain the nth element of a list of variable- 
length, delimited strings with no CAT would require reading 
through the first n-1 elements, searching for delimiters. 
If the same list had a CAT, obtaining the nth element would 
require only loading a pointer from the nth CAT slot. 

Virtual indices are treated as a pseudo inversion, as 
if it were an inverted container having the list member's 
record number as data, although no auxiliary data struc- 
ture is maintained.