README.txt ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ version changes: ^^^^^^^^^^^^^^^^ EMUv1.0.17 added explicit options for headers and sentences. Options [-h yes] and [-s yes] have been added. Description under input parameters. EMUv1.0.18 updates: 1. Handles indel mutations. Codes INS, DEL, and INDEL found in the wtaa column with the nucleotides/amino acids/number being inserted or deleted in the mtaa column 2. Extracted mutations not converted to three letter amino acid code 3. Added column which will be either MISSENSE or INDEL depending on the mutation 4. column has all variant type possibilities depending on the extracted mutation: PROTEIN, DNA, RNA 5. Use EMU_seq_filter_v1.2.pl to handle new column in EMU output 6. Removed ABG output file EMUv1.0.19 updates: 1. Fixed error in v18 for two mutation patterns EMU_seq_filter_v1.2 updates: 1. Handles column from EMUv1.0.18, EMUv1.0.19 2. Previous version, would replace EMU's column. The seq_filter type column moved after A/ Pipeline of Extaction of MUtation: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. input of EMU is a text file, each line consist of a tabdelimited triplet of 1) pubmed id, 2) title and 3) the plain text of the abstract. 2. Use EMU on the input file. 3. use the SEQ_filter on the mutations extracted. detailed description: ^^^^^^^^^^^^^^^^^^^^^ A2 EMU: ^^^^^^^^^^^ EMU needs the following files: hard coded filenames: AAconversion.pm %some perl scripts from Trevor. HUGOGeneNames.txt %the list of gene names. Cell_line_list_short.txt %the list of cell line names that can be confused with mutations i.e. cell line names that seems to be mutations. syntax: perl EMUv1.0.16.pl -f input1 [-s yes] [-h yes] input parameters: 1. [-f] argument. Input follows option. input1 - the input file with tab-delimted pubmed id, title and abstract in a plain text form. 2. [-s yes] optional argument. With this option, EMU processes the input text by sentences. 3. [-h yes] optional argument. With this option, the input text file has a header for the columns. The default is no header. A3 SEQ_filter: ^^^^^^^^^^^^^^^^^^ the seq_filter parser: syntax: perl EMU_seq_filter.pl the input file is the ouput from EMU. This method needs internet connection. It retrieves data from the NCBI server. Example: ^^^^^^^^^ let the PCA_abst_mutation.txt be the input file for EMU that contains the abstracts perl EMUv1.0.17.pl -f PCA_abst_mutation.txt perl EMUv1.0.17.pl -f PCA_paper.txt -s yes //application of EMU on full paper text (instead of just abstract) and runs EMU on sentences perl EMU_seq_filter.pl EMU_1.17_HUGO_PCA_abst_mutation.txt EMU_1.17_HUGO_PCA_abst_mutation_SF.txt specification of the input files: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All files are tab-delimited. the input file of the EMU has to look like: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pmid title abstract 10021378 Alzheimer's disease: clues from flies and worms. Presenilin mutations give rise to familial Alzheimer's disease and result in elevated production of amyloid beta peptide. Recent evidence that presenilins act in developmental signalling pathways may be the key to understanding how senile plaques, neurofibrillary tangles and apoptosis are all biochemically linked. . . . the output of the EMU and the input of the fasta check is: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pmid organism mut_pat1 pos_patt wtaa mtaa pos genes type 15146458 Humans g.4870T>C T C 4870 ANP32A;ANP32C;PC GENOM . . . the ouput of the seq_filter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pmid organism mut_pat1 pos_patt wtaa mtaa pos genes type fasta_check gi gene_name prot_id 10517877 Humans histidine to aspartic acid. codon 1104 HIS ASP 1104 ERCC5 PROTEIN YES 2073 ERCC5 51988900|REV . . .