July 30, 1990 ------------------------------------------------------------------------------- I. These are the primary tables of the TFD database, release 1.0 (June 1990). The database is formatted as ASCII tables in the files contained in this directory. This format was chosen for distribution purposes since nearly all relational database management systems have some capability for the importing of ASCII text files. The data dictionary for this database is contained in the file "tfd.dct," and the actual data in the "*.dat" files. The following differences exist between the tables described as published and those provided here: (1) Sequence information in the cdnas and elements tables are entered as 5 X 200 basepair entries per record (rather that 4 X 250 or 4 X 240). The elements table contains 960 bp per record as it did previously. (2) The field names for sequence entries in the cases of the cdnas and the elements tables have been changed from sequence1, sequence2, etc. to na_seq1, na_seq2, etc. (3) The elements table contains 112 rather than 184 records, in this release. There were a number of duplicated records and prokaryotic promoters which have been removed from the data files provided here. For a true promoter database, the user may want to consider the Eukaryotic Promoter Database (EPD) that is maintained by Phil Bucher of Stanford University. (4) Entries in the ref_n fields of the tables have been supplanted with the US Natl Library of Medicine Unique Identifiers (UIs) for the corresponding publications. (5) Entries to the cdna field in the factors table contain a "Y" entry if either a genomic or a cDNA clone exists. "N" entries are not contained in this field. (6) This (1.0) release contains 25, 193, 282, and 1432 records for the cdnas, domains, factors, and sites tables, respectively. II. The data dictionary for this database is organized as follows: The organization of the five tables is presented in five different sections, with the name of each table indicated at the top of each section. There are four columns per individual table dictionary. Each row contains the necessary information for a particular field. The first column indicates the position of the first character of the field in the complete fixed-length record. The second column indicates the length of the record. The third column indicates the field type (almost always "c" for character). The fourth column indicates the name of the field). III. The program "dynamic" as described in the publication below may be obtained from the MBCRR at the Dana-Farber Cancer Institute. In this case please write to: Temple F. Smith Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115 IV. Some TFD- associated files are contained in the "gcgs" and "software" subdirectories. For UWGCG users, some GCG-compatible files derived from the SITES table are contained in the "gcgs" subdirectory. For dBASE users, some files useful in converting the database into dBASE format are located in the "software" directory. V. The appropriate reference for this database is: Ghosh, D. (1990) A Relational Database of Transcription Factors. Nucleic Acids Res 18: 1749-1756. ------------------------------------------------------------------------------- David Ghosh NCBI, NIH Bethesda, MD 20894 Internet address: ghosh@ncbi.nlm.nih.gov -------------------------------------------------------------------------------