mirror database of papers by ftp
"mirror" DATABASE FOR PAPERS IN FTP ARCHIVES (1 March 1993)
Thank you to those people who replied to Charles Wells' and my request
for information about FTP archives of papers. He has handed his information
over to me, and everything I have is in
/theory/FTP-sites at theory.doc.ic.ac.uk
Here again is my own entry in the database by way of a template.
The meaning of the fields will be explained further down this message.
comment=Papers by Paul Taylor at Imperial
# phone=+44 71 589 5111 x 5057
# fax=+44 71 581 8024
# address=Dept of Computing, Imperial College, London SW7 2BZ, UK
There follow some answers (not necessarily definitive) to some of the
questions people have asked me about this.
MACHINE READABLE. I would like to stress first the importance of observing
the syntax. It is proposed that this database should be used by a program
in batch mode, so if the syntax isn't right the program will choke - at night.
Even when data is to be used by me as a human, I find it increasingly hard
to keep track of its volume unless people make an effort to set apprpriate
"Subject:" lines on their mail, keep information on one line, etc, so that
I can use "grep" to search for things in my 6Mb of email per year.
WHAT IS FTP? Having a personal FTP archive for your papers is like having a
personal journal with no referees and immediate publication. This means that
the constraints on publication, though very loose, are like those on
publishers: if you make it hard for people to subscribe (eg by allowing your
filespace to become disorganised or too large) then they'll stop. Subscribers
can get single issues by manual interactive use of FTP, or standing orders
by using the mirror program.
MY SITE DOESN'T HAVE AN ARCHIVE. Then use someone else's. There are people
from Cambridge, Darmstadt, Aarhus and other places who use the IC archive.
WHAT'S THE DATABASE FOR? Even if automatic mirroring is not used, having a
database in a machine readable format will make it much easier to find files.
Maybe you will want to keep track of your immediate colleagues' work: in
this case you can extract their entries from the database and set just those
up to be mirrored. Although the information is of no use to the mirror program,
this is also a sensible place to keep address information.
A TYPICAL FTP SESSION. Here's how you can get my Cambridge PhD thesis, for
machine% ftp theory.doc.ic.ac.uk
Connected to beauty.doc.ic.ac.uk.
220 beauty FTP server (Version 6.14 Mon Nov 18 17:45:21 GMT 1991) ready.
Name (theory.doc.ic.ac.uk:pt): anonymous ["ftp" will often do]
331 Guest login ok, send e-mail address as password.
230-<some welcome message>
230-<put "-" at the beginning of your password>
230-<if these messages cause problems>
230 Guest login ok, access restrictions apply.
ftp> cd theory/papers/Taylor [set directory on remote machine]
250 CWD command successful.
ftp> bin [binary mode for dvi (etc) files]
200 Type set to I.
ftp> lcd ftp-imports/Taylor [set directory of your machine]
ftp> hash [tells you how it's going]
Hash mark printing on (8192 bytes/hash mark).
ftp> get thesis.dvi [the file to fetch]
200 PORT command successful.
150 Opening BINARY mode data connection for thesis.dvi (907024 bytes).
##################### [five lines of these]
226 Transfer complete.
local: thesis.dvi remote: thesis.dvi
907024 bytes received in 4.1 seconds (2.1e+02 Kbytes/s)
WHAT ABOUT SOFTWARE? A lot of mirroring of software already goes on; for
example the program which I propose to use was written to maintain a large
archive of general software at Imperial (not to be confused with the TeX &
Computing Theory archive which I maintain). For this reason I don't really
consider it my job to get into the business of archiving software.
Nevertheless, it does seem a good idea to add "papers/" to the local directory
entries, so that there can also be a "software/" tree and a "conferences/" tree.
CONFERENCE ANNOUNCEMENTS? It does seem reasonable to include those too,
so I have added "papers/" to the local directory names for papers, so that
we can have "conferences/" and "software/" too. Use the conference name
where the author's name is appropriate for papers.
JOINT PAPERS? Where do you put joint papers in your filing cabinet? I suggest
choosing one of the authors, and then putting a cross reference (ie a short
file which says "see BloggsJ for Bloggs & Smith") in the other directories.
Personal bibliography databases should have complete entries under each
individual author. If most of your work is done within a particular stable
group, that can have an entry in the database as if it were a single person.
ACCENTS & NAME CLASHES? It's not a good idea to use punctuation in filenames,
or to make them too long. Clashes of both surname and initials within our
community should be resolved by personal negotiation, ie agreeing to adopt
initials, if necessary fictitious ones. Choose a mixed case alphabetic version
of your surname and initials and stick to it.
WHAT IS THE "mirror" PROGRAM? It was written by Lee McLoughlin in perl and
uses the FTP protocol in much the same way as you would interactively, except
that it's automatic and should be run outside the normal working hours of the
sites concerned. You can get it from src.doc.ic.ac.uk as /packages/mirror/.
It runs under Unix and is a bit shy of non-Unix FTP archives. Nevertheless,
if some Unix site near you is maintaining an FTP archive of a wide range of
papers and software, this is still of benefit to you even though you can't
maintain your own copy on your non-Unix machine.
HOW WILL THE DATA BE MAINTAINED? In the first instance I shall maintain it
in the directory /theory/FTP-sites at theory.doc.ic.ac.uk. You may wish to
mirror this directory, but I would advise against setting up anything too
automatic for the time being. When I am satisfied that the database format
is suitable and the data is correct for the mirror program, I shall ask the
sites concerned to maintain their own databases, and use mirror to keep them
up to date in my archive. In the long run, however, I think it would be
better for each author to maintain this information in his/her own bibtex
bibliography file (see /theory/bibliography/TaylorP.bib for instance) and
then the mirror database can be extracted from this automatically.
WHY THIS FORMAT? Per-author entries mean that you can mirror just the people
you want rather than whole sites. As long as site managers make up their
minds about directory names and stick to them, the information should only
change radically when people move to other institutions (though phone extension
numbers may change). It's sorted by site so that sites can maintain it and
to make mirror/FTP mor efficient (it only logs in to each site once).
A version sorted by author would be possible if someone wrote the program.
Sorting by topic is completely impractical, though if people maintained their
own bibtex files it would be possible to use "grep" to search by keyword.
WHAT DO THE FIELDS MEAN?
See "mirror" itself and the "defaults" at the end of the Imperial database
for further explanation.
"package" is a handle for the mirror program: you can ask it just to get
a particular piece of software or, in this case, author's files.
"comment" is copied to the mirror program log
"site" is the Internet address of the FTP archive
"remote_dir" is the directory on that archive
"local_dir" is the directory into which you want to copy. I made a mistake
in my original message: there should be a "+" instead of an "=". This
means that this field is appended to the "local_dir" setting in the
defaults so that you can set the root of your own archive (copy) tree
as you think appropriate. Also, in order to allow conferences and software
in the same databse, I've added "papers/" to these settings.
The remaining fields are not recognised by "mirror" and so are commented out.
"# email" is your email address in Internet format. Please put the actual
address in angle brackets <> then you can add your name or whatever in front.
"# phone" is your direct office phone number including extension and, where
appropriate, secretary's name and extension. Please use international
and not America format, for example London, New York and Paris are
+4471, +1212 and +331 respectively, not 071, (212) or (1).
"# fax" likewise. PLease specify if this is in a public area.
"# address" postal address including country & code.