************************************** ProAnWin - Protein Analyst for Windows ************************************** version 3.01 Multiple sequence alignment, analysis of protein sequences and structures, structure-activity relationships, design of protein-engineering experiments Copyright(c)1995-97 I.Pika, A.Frolov, V.Ivanisenko, A.Eroshkin All Trademarks and Registered Names are acknowledged in this document. The files required to run ProAnWin are distributed in the form of a single compressed file (self-extracted). Create a directory "PROANWIN" on your hard disk, for example, C and copy the compressed file to the directory. Unpack the program (type PAW in DOS prompt and answer Yes to all questions). Once you extracted archive files, start Windows and start the program. This program is provided "AS IS" without any warranty, expressed or implied to you or any other person. The authors will not be liable for incidental, consequential or other damages arising through the use of this software. As the program is under further development the documentation may not reflect all current program options. Program content: Directory: Main directory - program modules DATA - files with amino acid physico-chemical properties, manual, examples with input and output files ALIGNS - aligned sequences of 50 protein families Main program features: (* - new features) - Inputs sequences from main protein sequence formats (SWISS-PROT, PIR, FASTA, GCG, Clustal); - Inputs protein 3D structure from Protein Data Bank files (PDB format) - Makes multiple sequence alignment (both automatic and manual); - Threads multiple alignment onto known 3-dimensional structure; - Inputs data on protein activities/property or phenotype; - Transforms activity values (log (A), ln (A), A/K, A+k, etc.); * Searches linear and spatial sites, conservative and variable in changes of specified physico-chemical properties (for example, helical hydrophobic moment); * Searches linear and spatial sites, having high and low values of specified physico-chemical properties (for example, Kyte-Doolittle hydrophobicity); * Plots sets of different physico-chemical profiles for individual protein sequence; * Plots specified physico-chemical profiles for the set of sequences; - Searches linear sites in multiple protein alignment and spatial sites in protein 3D structure influencing protein activity/property; * Plots average physico-chemical profile for the family of sequences; * Plots profile of dispersion of physico-chemical profiles for the family of sequences; - Analyses relationships between site structural characteristics and protein activities by multiple linear regression analysis; - Analyses structural differences between proteins divided by functional, evolutionary or other criteria; - Investigates physico-chemical factors related with activity changes in a set of mutant proteins; * Simulates protein-engineering experiments and predicts protein activity (automatic mutant generation to increase or decrease protein activity, manual mutant generation); * Predicts activity for newly sequenced proteins; * Plots physico-chemical profiles for protein 3D structure; - Makes protein 3D pictures (mono and stereo) with sites highlighted sites; * Has more then 400 amino acid physico-chemical properties; - Calculates ten types (functions) of protein site characteristics, including average values, helical moments, beta-strand moments, etc. - Saves all the results to disk. With the using of ProAnalyst it is possible to receive new results important in biochemistry, molecular biology etc., and design protein engineering experiments; 1. The program helps to find information that can not be found by other programs (activity/property-modulating sites, phenotype defining regions); 2. The user has an opportunity to conduct the analysis of structure-activity relationships in sequences and 3D structure. 3. The program permits to generate and to check up a plenty of hypothesis about the role of different sites and their various physico-chemical characteristics in protein activity, that is rather difficult or impossible at the "hand-operated" analysis. 4. Search of structure -- activity relation is carried out with the use of multiple regression analysis and the results have statistical valuations on reliability. 5. The user has an opportunity to work simultaneously with sequences and 3D protein structure (visualization of the sites in 3D structure, marked at the sequence and vise versa). 6. Alongside with the conventional physico-chemical characteristics of sites in a sequence (average and alpha-helical moments) the user analyzes 8 additional characteristics of sequential (linear) sites and 5 characteristics for spatial sites. 7. The program permits considerably to reduce time during creation a mutant proteins with desired property. To investigate protein/peptide family of your interest you should have or prepare sequence data file(s). You can use alternatively sequences data files in FASTA (PEARSON), PIR, SWISS-PROT, CLUSTAL, GCG formats or in INTERNAL 1 format (3 data files with protein names (*.seq), protein activities or grouping (*.act) and aligned sequences (*.ali), see the examples in DATA directory) in the current directory. 3D protein structure you can take from PDB database. To use the program follow the steps: - start the program; - select sequences of the family you are going to investigate; - select a file with required physico-chemical properties of amino acids; - load protein 3D structure (if available); - define an investigated fragment (or up to 8 fragments); - define factors for analysis; and so on. All other information you'll get from MANUAL.TXT or HELP. ProAnWin IS USEFUL IN: - protein structure-function and structure-activity investigations; - designing proteins and peptides with improved activity; - making multiple protein alignments and getting sense from it; - studying phenotype-genotype correlations; - preparation of protein 3D pictures with sites highlighted; - comparative protein sequence analysis. PUBLICATIONS: 1. Frolov A.S., Pika I.S., Eroshkin A.M. ProMSED: Protein multiple sequence editor for Windows 3.11/95. CABIOS, 1997, 13, 243-248 2. Morozov B.M., Ivanisenko V.A., Eroshkin A. M., Ugarova N.N. Computer analysis of relations between bioluminescence color and primary structure of beetle luciferases: identification of the sites influencing bioluminescence color. Molec. Biology (Russia), 1996, 30, 1167-1172. 3. Ivanisenko V.A., Pika I.S., Pinin S.I., Fomina T.I., Eroshkin A.M. Studying structure-activity and phenotype-genotype relationships in protein families. Methods, algorithms and applications. Folding and Design, 1996, 1, Suppl., p.84. 4. Eroshkin A.M., Fomin V.I., Zhilkin P.A., Ivanisenko V.A., Kondrakhin Y.V. PROANAL version 2: multifunctional program for analysis of multiple protein sequence alignments and studying structure-activity relationships in protein families. CABIOS, 1995, 11, 39-44. 5. Eroshkin A.M., Zhilkin P.A., Fomin V.I. Algorithm and computer program PROANAL for analysis of relationship between structure and activity in a family of proteins or peptides. CABIOS, 1993, 9, 491-497. 6. Eroshkin A.M., Minenkova O.O., Fomin V.A., Ivanisenko V.A., Ilyichev A.A. Analysis of peptide fragment insertions into major coat protein of bacteriophages M13, f1 and fd. Relation of protein structural characteristics and viability of mutant phages. Molec. Biology (Russia), 1993, 27, 1345-1355. The version installed has limit in the number of analyzed sequences (15). If you have problems running ProAnWin please consult the manual and HELP carefully to see if they can help. If you still need advice then please contact the authors by e-mail: eroshkin@vector.nsk.su or State Research Center of Virology an Biotechnology "Vector" Koltsovo, Novosibirsk Region, 633159 Russia Tel: (3832) - 647774 Fax: (3832) - 328831 Ask authors for the updated ProAnWin version and ************************************************************* ADDITIONAL NEW SOFTWARE TOOLS ProMSED2, ProAnalyst, PROANAL3 ************************************************************* ProMSED2, MS Windows application for both automatic and manual DNA and protein sequence alignment, editing, comparison and analysis. ProMSED2 is the enhancement of ProMSED made according to user's remarks and suggestions. The program reads main sequence formats and performs automatic alignments, alignment visualization and editing and it allows sequences to be aligned interactively leaving unchanged previously aligned regions. The program has an user-friendly interface. Manual alignment and sequence analysis are facilitated by coloring schemes reflecting amino acid similarity in mutational, physico-chemical and other properties. Although ProMSED was targeted at protein sequences, it can be used on DNA sequences as well. The program provides flexible tool for sequences alignment, analysis, visualization, edition and presentation preparation. The program does or has (+ - NEW or enhanced features): + inputs DNA and protein sequences in NBRF/PIR, Pearson (Fasta), MSF (GSG), EMBL/SwissProt, Intelligenetics and CLUSTAL formats; o has interface and functions like in others Windows applications (source file view, font changing, marking/unmarking, block and sequence selection, cut and paste, UNDO, etc.); o loads several sequence families in different windows, adds sequences to existing alignment, combines sequences from various files; + outputs the alignment in several popular formats; + makes presentation quality color and black-and-white prints of complete alignment or any selected block; + saves alignment picture as Windows metafile and bitmap; o permits to apply automatic alignment interactively (with options to change the alignment parameters) to any selected part of sequences of marked block; + calculates sequence similarity of complete sequences, of any selected sequence subset or of marked block in % and in PAM250 units (matrix of amino acid similarity); + calculates total (average for %) sequence similarity value - an estimation of alignment quality; + prints sequence similarity matrix; + sorts sequences by similarity of complete sequences or marked block; + displays conserved and semiconserved positions; + has many amino acid coloring schemes aimed to facilitate manual alignment and understanding protein sequence features. Some schemes are: EVOLUTIONARY CONSERVATIVE (reflects amino acid mutational properties), COMPLEX (similarity of amino acids in physico-chemical properties), HYDROPHOBICITY, CHARGE, BIG RESIDUES, ALPHA-HELIX, HELIX-BREAKERS, etc. The options to input user-defined schemes or change the colors of any amino acid groups are available; + searches subsequences and complex sequence patterns; o has complete HELP. ProAnalyst: DOS version of ProAnWin with additional functionality (single and multiple sequences analysis, profiles analysis, combinatorial libraries; design of protein engineering experiments) o data conversion from several protein sequence formats (FASTA, SWISS-PROT, PIR, CLUSTAL). o databases with more then 50 amino acid physico-chemical properties; o inputs 3D protein structure in PDB format; o flexible VISUALIZATION OF PROTEIN 3d STRUCTURES with sites highlighted; o inputs user-defined protein activities, properties or related phenotypes; o searching SITES INFLUENCING PROTEIN ACTIVITY and analyzing relationships between protein site structural characteristics and protein activities (properties or related phenotypes); o multiple linear regression analysis of STRUCTURE-ACTIVITY relationships, discriminant analysis and ANOVA; o intra and cross group VARIABILITY analysis; o GENOTYPE -- PHENOTYPE CORRELATION analysis (e.g., for drug resistance in viruses); o alphabetical and physico-chemical analysis of protein features variations (in 1D and 3D structures); o structure-activity determination profile (SAD); o investigation of physico-chemical factors related with activity or property changes in MUTANT PROTEINS; o searching motifs in COMBINATORIAL LIBRARIES (peptide, phage- display libraries, etc.) with MOTIF MAPPING on the target protein; o design PROTEIN-ENGINEERING experiments; o ACTIVITY, PROPERTY AND PHENOTYPE PREDICTION; o sorting sequences by protein activity value, protein group number and by motifs found; o mapping results on 3D structure and sequences. PROANAL3: Protein structure-activity analysis, analysis of physico-chemical properties variations and conservations (for MS-DOS) o main feature of ProAnWin with an addition of nonlinear analysis of protein structure-activity relationships; o enhanced physico-chemical profiles for protein features prediction (functional and variable sites, secondary structures, amphipathic helices, antigenic sites, etc.).