ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (3,889)
  • BioMed Central  (3,889)
  • American Chemical Society
  • American Institute of Physics
  • American Meteorological Society
  • American Physical Society (APS)
  • Cell Press
  • Elsevier
  • MDPI Publishing
  • Nature Publishing Group
  • Reed Business Information
  • 2010-2014  (3,889)
  • 1995-1999
  • 1985-1989
  • 1955-1959
  • 1935-1939
  • 2012  (2,212)
  • 2010  (1,677)
  • 1989
  • 1959
  • BMC Genomics  (515)
  • BMC Bioinformatics  (230)
  • 9756
  • 9764
  • Biology  (3,889)
  • Geosciences
  • Chemistry and Pharmacology
Collection
  • Articles  (3,889)
Publisher
  • BioMed Central  (3,889)
  • American Chemical Society
  • American Institute of Physics
  • American Meteorological Society
  • American Physical Society (APS)
  • +
Years
  • 2010-2014  (3,889)
  • 1995-1999
  • 1985-1989
  • 1955-1959
  • 1935-1939
Year
Topic
  • 1
    Publication Date: 2012-12-28
    Description: Background: Trypanosoma cruzi, the causal agent of Chagas Disease, affects more than 16 million people in Latin America. The clinical outcome of the disease results from a complex interplay between environmental factors and the genetic background of both the human host and the parasite. However, knowledge of the genetic diversity of the parasite, is currently limited to a number of highly studied loci. The availability of a number of genomes from different evolutionary lineages of T. cruzi provides an unprecedented opportunity to look at the genetic diversity of the parasite at a genomic scale. Results: Using a bioinformatic strategy, we have clustered T. cruzi sequence data available in the public domain and obtained multiple sequence alignments in which one or two alleles from the reference CL-Brener were included. These data covers 4 major evolutionary lineages (DTUs): TcI, TcII, TcIII, and the hybrid TcVI. Using these set of alignments we have identified 288,957 high quality single nucleotide polymorphisms and 1,480 indels. In a reduced re-sequencing study we were able to validate ~ 97% of high-quality SNPs identified in 47 loci. Analysis of how these changes affect encoded protein products showed a 0.77 ratio of synonymous to non-synonymous changes in the T. cruzi genome. We observed 113 changes that introduce or remove a stop codon, some causing significant functional changes, and a number of tri-allelic and tetra-allelic SNPs that could be exploited in strain typing assays. Based on an analysis of the observed nucleotide diversity we show that the T. cruzi genome contains a core set of genes that are under apparent purifying selection. Interestingly, orthologs of known druggable targets show statistically significant lower nucleotide diversity values. Conclusions: This study provides the first look at the genetic diversity of T. cruzi at a genomic scale. The analysis covers an estimated ~ 60% of the genetic diversity present in the population, providing an essential resource for future studies on the development of new drugs and diagnostics, for Chagas Disease. These data is available through the TcSNP database (http://snps.tcruzi.org).
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2012-12-28
    Description: Background: MicroRNAs (miRNAs) are a class of small non-coding RNAs that regulate gene expression by targeting mRNAs for translation repression or mRNA degradation. Although many miRNAs have been discovered and studied in human and mouse, few studies focused on porcine miRNAs, especially in genome wide. Results: Here, we adopted computational approaches including support vector machine (SVM) and homology searching to make a global scanning on the pre-miRNAs of pigs. In our study, we built the SVM-based porcine pre-miRNAs classifier with a sensitivity of 100%, a specificity of 91.2% and a total prediction accuracy of 95.6%, respectively. Moreover, 2204 novel porcine pre-miRNA candidates were found by using SVM-based pre-miRNAs classifier. Besides, 116 porcine pre-miRNA candidates were detected by homology searching. Conclusions: We identified the porcine pre-miRNA in genome-wide through computational approaches by utilizing the data sets of pigs and set up the porcine pre-miRNAs library which may provide us a global scanning on the pre-miRNAs of pigs in genome level and would benefit subsequent experimental research on porcine miRNA functional and expression analysis.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2012-12-28
    Description: Background: Rhizobium tropici CIAT 899 and Rhizobium sp. PRF 81 are alpha-Proteobacteria that establish nitrogen-fixing symbioses with a range of legume hosts. These strains are broadly used in commercial inoculants for application to common bean (Phaseolus vulgaris) in South America and Africa. Both strains display intrinsic resistance to several abiotic stressful conditions such as low soil pH and high temperatures, which are common in tropical environments, and to several antimicrobials, including pesticides. The genetic determinants of these interesting characteristics remain largely unknown. Results: Genome sequencing revealed that CIAT 899 and PRF 81 share a highly-conserved symbiotic plasmid (pSym) that is present also in Rhizobium leucaenae CFN 299, a rhizobium displaying a similar host range. This pSym seems to have arisen by a co-integration event between two replicons. Remarkably, three distinct nodA genes were found in the pSym, a characteristic that may contribute to the broad host range of these rhizobia. Genes for biosynthesis and modulation of plant-hormone levels were also identified in the pSym. Analysis of genes involved in stress response showed that CIAT 899 and PRF 81 are well equipped to cope with low pH, high temperatures and also with oxidative and osmotic stresses. Interestingly, the genomes of CIAT 899 and PRF 81 had large numbers of genes encoding drug-efflux systems, which may explain their high resistance to antimicrobials. Genome analysis also revealed a wide array of traits that may allow these strains to be successful rhizosphere colonizers, including surface polysaccharides, uptake transporters and catabolic enzymes for nutrients, diverse iron-acquisition systems, cell wall-degrading enzymes, type I and IV pili, and novel T1SS and T5SS secreted adhesins. Conclusions: Availability of the complete genome sequences of CIAT 899 and PRF 81 may be exploited in further efforts to understand the interaction of tropical rhizobia with common bean and other legume hosts.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2012-12-29
    Description: Background: RNA interference (RNAi) becomes an increasingly important and effective genetic tool to study the function of target genes by suppressing specific genes of interest. This system approach helps identify signaling pathways and cellular phase types by tracking intensity and/or morphological changes of cells. The traditional RNAi screening scheme, in which one siRNA is designed to knockdown one specific mRNA target, needs a large library of siRNAs and turns out to be time-consuming and expensive. Results: In this paper, we propose a conceptual model, called compressed sensing RNAi (csRNAi), which employs the unique combination of group of small interfering RNAs (siRNAs) to knockdown a much larger size of genes. This strategy is based on the fact that one gene can be partially bound with several small interfering RNAs (siRNAs) and conversely, one siRNA can bind to a few genes with distinct binding affinity. This model constructs a multi-to-multi correspondence between siRNAs and their targets, with siRNAs much fewer than mRNA targets, compared with the conventional scheme. Mathematically this problem involves an underdetermined system of equations (linear or nonlinear), which is ill-posed in general. However, the recently developed compressed sensing (CS) theory can solve this problem. We present a mathematical model to describe the csRNAi system based on both CS theory and biological concerns. To build this model, we first search nucleotide motifs in a target gene set. Then we propose a machine learning based method to find the effective siRNAs with novel features, such as image features and speech features to describe an siRNA sequence. Numerical simulations show that we can reduce the siRNA library to one third of that in the conventional scheme. In addition, the features to describe siRNAs outperform the existing ones substantially. Conclusions: This csRNAi system is very promising in saving both time and cost for large-scale RNAi screening experiments which may benefit the biological research with respect to cellular processes and pathways.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2012-12-29
    Description: Background: Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. Results: We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. Conclusions: We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case--control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2012-12-19
    Description: Background: In higher eukaryotes, gene expression is regulated at different levels. In particular, 3[prime]UTRs play a central role in translation, stability and subcellular localization of transcripts. In recent years, the development of high throughput sequencing techniques has facilitated the acquisition of transcriptional data at a genome wide level. However, annotation of the 3[prime] ends of genes is still incomplete, thus limiting the interpretation of the data generated. For example, we have previously reported two different genes, ADD2 and CPEB3, with conserved 3[prime]UTR alternative isoforms not annotated in the current versions of Ensembl and RefSeq human databases. Results: In order to evaluate the existence of other conserved 3[prime] ends not annotated in these databases we have now used comparative genomics and transcriptomics across several vertebrate species. In general, we have observed that 3[prime]UTR conservation is lost after the end of the mature transcript. Using this change in conservation before and after the 3[prime] end of the mature transcripts we have shown that many conserved ends were still not annotated. In addition, we used orthologous transcripts to predict 3[prime]UTR extensions and validated these predictions using total RNA sequencing data. Finally, we used this method to identify not annotated 3[prime] ends in rats and dogs. As a result, we report several hundred novel 3[prime]UTR extensions in rats and a few thousand in dogs. Conclusions: The methods presented here can efficiently facilitate the identification of not-yet-annotated conserved 3[prime]UTR extensions. The application of these methods will increase the confidence of orthologous gene models across vertebrates.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2012-12-20
    Description: Background: Cia5a is a locus on rat chromosome 10 that regulates disease severity and joint damage in two models of rheumatoid arthritis, collagen- and pristane-induced arthritis (PIA). In this study, we aimed to identify cellular and molecular processes regulated by Cia5a using microarray-based gene expression analysis of synovial tissues from MHC identical DA (severe erosive disease) and DA.F344(Cia5a) congenics (mild non-erosive disease) rats. Results: Synovial tissues from six DA and eight DA.F344(Cia5a) rats were analyzed 21 days after the induction of PIA using the Illumina RatRef-12 BeadChip (21,922 genes) and selected data confirmed with qPCR. There was a significantly increased expression of pro-inflammatory mediators such as Il1b (5-fold), Il18 (3.9-fold), Cxcl1 (10-fold), Cxcl13 (7.5-fold) and Ccl7 (7.9-fold), and proteases like Mmp3 (23-fold), Mmp9 (32-fold), Mmp14 (4.4-fold) and cathepsins in synovial tissues from DA, with reciprocally reduced levels in congenics. mRNA levels of 47 members of the Spleen Tyrosine Kinase (Syk) pathway were significantly increased in DA synovial tissues compared with DA.F344(Cia5a), and included Syk (5.4-fold), Syk-activating receptors and interacting proteins, and genes regulated by Syk such as NFkB, and NAPDH oxidase complex genes. Nuclear receptors (NR) such as Rxrg, Pparg and Rev-erba were increased in the protected congenics, and so was the anti-inflammatory NR-target gene Scd1 (54-fold increase). Tnn (72-fold decrease) was the gene most significantly increased in DA. Conclusions: Analyses of gene expression in synovial tissues revealed that the arthritis severity locus Cia5a regulates the expression of key mediators of inflammation and joint damage, as well as the expression of members of the Syk pathway. This expression pattern correlates with disease severity and joint damage and along with the gene accounting for Cia5a could become a useful biomarker to identify patients at increased risk for severe and erosive disease. The identification of the gene accounting for Cia5a has the potential to generate a new and important target for therapy and prognosis.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2012-12-20
    Description: Background: Traditional candidate gene approach has been widely used for the study of complex diseases including obesity. However, this approach is largely limited by its dependence on existing knowledge of presumed biology of the phenotype under investigation. Our combined strategy of comparative genomics and chromosomal heritability estimate analysis of obesity traits, subscapular skinfold thickness and back-fat thickness in Korean cohorts and pig (Sus scrofa), may overcome the limitations of candidate gene analysis and allow us to better understand genetic predisposition to human obesity. Results: We found common genes including FTO, the fat mass and obesity associated gene, identified from significant SNPs by association studies of each trait. These common genes were related to blood pressure and arterial stiffness (P = 1.65E-05) and type 2 diabetes (P = 0.00578). Through the estimation of variance of genetic component (heritability) for each chromosome by SNPs, we observed a significant positive correlation (r = 0.479) between genetic contributions of human and pig to obesity traits. Furthermore, we noted that the phenotypic variance for obesity can be explained dominantly by chromosome 2, which is syntenic to pig chromosomes 3 and 15. Conclusions: Obesity genetics still awaits further discovery. Navigating syntenic regions suggests obesity candidate genes on chromosome 2 that are previously known to be associated with obesity-related diseases: MRPL33, PARD3B, ERBB4, STK39, and ZNF385B.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2012-12-20
    Description: Background: Phenotypic evolution in animals is thought to be driven in large part by differences in gene expression patterns, which can result from sequence changes in cis-regulatory elements (cis-changes) or from changes in the expression pattern or function of transcription factors (trans-changes). While isolated examples of trans-changes have been identified, the scale of their overall contribution to regulatory and phenotypic evolution remains unclear. Results: Here, we attempt to examine the prevalence of trans-effects and their potential impact on gene expression patterns in vertebrate evolution by comparing the function of identical human tissue-specific enhancer sequences in two highly divergent vertebrate model systems, mouse and zebrafish. Among 47 human conserved non-coding elements (CNEs) tested in transgenic mouse embryos and in stable zebrafish lines, at least one species-specific expression domain was observed in the majority (83%) of cases, and 36% presented dramatically different expression patterns between the two species. Although some of these discrepancies may be due to the use of different transgenesis systems in mouse and zebrafish, in some instances we found an association between differences in enhancer activity and changes in the endogenous gene expression patterns between mouse and zebrafish, suggesting a potential role for trans-changes in the evolution of gene expression. Conclusions: In total, our results: (i) serve as a cautionary tale for studies investigating the role of human enhancers in different model organisms, and (ii) suggest that changes in the trans environment may play a significant role in the evolution of gene expression in vertebrates.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2012-12-20
    Description: Background: Secretoglobin 1A1 (SCGB 1A1), also called Clara cell secretory protein, is the most abundantly secreted protein of the airway. The SCGB1A1 gene has been characterized in mammals as a single copy in the genome. However, analysis of the equine genome suggested that horses might have multiple SCGB1A1 gene copies. Non-ciliated lung epithelial cells produce SCGB 1A1 during inhalation of noxious substances to counter airway inflammation. Airway fluid and lung tissue of horses with recurrent airway obstruction (RAO), a chronic inflammatory lung disease affecting mature horses similar to environmentally induced asthma of humans, have reduced total SCGB 1A1 concentration. Herein, we investigated whether horses have distinct expressed SCGB1A1 genes; whether the transcripts are differentially expressed in tissues and in inflammatory lung disease; and whether there is cell specific protein expression in tissues. Results: We identified three SCGB1A1 gene copies on equine chromosome 12, contained within a 512-kilobase region. Bioinformatic analysis showed that SCGB1A1 genes differ from each other by 8 to 10 nucleotides, and that they code for different proteins. Transcripts were detected for SCGB1A1 and SCGB1A1A, but not for SCGB1A1P. The SCGB1A1P gene had most inter-individual variability and contained a non-sense mutation in many animals, suggesting that SCGB1A1P has evolved into a pseudogene. Analysis of SCGB1A1 and SCGB1A1A sequences by endpoint-limiting dilution PCR identified a consistent difference affecting 3 bp within exon 2, which served as a gene-specific "signature". Assessment of gene- and organ-specific expression by semiquantitative RT-PCR of 33 tissues showed strong expression of SCGB1A1 and SCGB1A1A in lung, uterus, Fallopian tube and mammary gland, which correlated with detection of SCGB 1A1 protein by immunohistochemistry. Significantly altered expression of the ratio of SCGB1A1A to SCGB1A1 was detected in RAO-affected animals compared to controls, suggesting different roles for SCGB 1A1 and SCGB 1A1A in this inflammatory condition. Conclusions: This is the first report of three SCGB1A1 genes in a mammal. The two expressed genes code for proteins predicted to differ in function. Alterations in the gene expression ratio in RAO suggest cell and tissue specific regulation and functions. These findings may be important for understanding of lung and reproductive conditions.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2012-11-09
    Description: Background The robust identification of isotope patterns originating from peptides being analyzed through mass spectrometry (MS) is often significantly hampered by noise artifacts and the interference of overlappingpatterns arising e.g. from post-translational modifications. As the classification of the recorded data points into either 'noise' or 'signal' lies at the very root of essentially every proteomic application, the quality of the automated processing of mass spectra can significantly influence the way the data might be interpreted within a given biological context.Results We propose non-negative least squares/non-negative least absolute deviation regression to fit a raw spectrum by templates imitating isotope patterns. In a carefully designed validation scheme, we show that the method exhibits excellent performance in pattern picking. It is demonstrated that the method is able to disentangle complicated overlaps of patterns. Conclusions: We find that regularization is not necessary to prevent overfitting and that thresholding is an effective and user-friendly way to perform feature selection. The proposed method avoids problems inherent in regularization-based approaches, comes with a set of well-interpretable parameters whose default configuration is shown to generalize well without the need for fine-tuning, and is applicable to spectra of different platforms. The R package IPPD implements the method and is available from the Bioconductor platform (http://bioconductor.fhcrc.org/help/bioc-views/devel/bioc/html/IPPD.html).
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2012-11-09
    Description: Background: Monosporascus cannonballus is the main causal agent of melon vine decline disease. Several studies have been carried out mainly focused on the study of the penetration of this pathogen into melon roots, the evaluation of symptoms severity on infected roots, and screening assays for breeding programs. However, a detailed molecular view on the early interaction between M. cannonballus and melon roots in either susceptible or resistant genotypes is lacking. In the present study, we used a melon oligo-based microarray to investigate the gene expression responses of two melon genotypes, Cucumis melo 'Piel de sapo' ('PS') and C. melo 'Pat 81', with contrasting resistance to the disease. This study was carried out at 1 and 3 days after infection (DPI) by M. cannonballus. Results: Our results indicate a dissimilar behavior of the susceptible vs. the resistant genotypes from 1 to 3 DPI. 'PS' responded with a more rapid infection response than 'Pat 81' at 1 DPI. At 3 DPI the total number of differentially expressed genes identified in 'PS' declined from 451 to 359, while the total number of differentially expressed transcripts in 'Pat 81' increased from 187 to 849. Several deregulated transcripts coded for components of Ca2+ and jasmonic acid (JA) signalling pathways, as well as for other proteins related to defence mechanisms. Transcriptional differences in the activation of the JA-mediated response in 'Pat 81' compared to 'PS' suggested that JA response might be partially responsible for their observed differences in resistance. Conclusions: As a result of this study we have identified for the first time a set of candidate genes involved in the root response to the infection of the pathogen causing melon vine decline. This information is useful for understanding the disease progression and resistance mechanisms few days after inoculation.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    facet.materialart.
    Unknown
    BioMed Central
    Publication Date: 2012-11-10
    Description: Background: The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred. Results: For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, weintroduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrateits performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement. Conclusions: Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability andoptimality to be a significant improvement in the real-world phylogenetic analysis of sequence data.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2012-11-10
    Description: Background: Extant sauropsids (reptiles and birds) are divided into two major lineages, the lineage of Testudines (turtles) and Archosauria (crocodilians and birds) and the lineage of Lepidosauria (tuatara, lizards, worm lizards and snakes). Karyotypes of these sauropsidan groups generally consist of macrochromosomes and microchromosomes. In chicken, microchromosomes exhibit a higher GC-content than macrochromosomes. To examine the pattern of intra-genomic GC heterogeneity in lepidosaurian genomes, we constructed a cytogenetic map of the Japanese four-striped rat snake (Elaphe quadrivirgata) with 183 cDNA clones by fluorescence in situ hybridization, and examined the correlation between the GC-content of exonic third codon positions (GC3) of the genes and the size of chromosomes on which the genes were localized. Results: Although GC3 distribution of snake genes was relatively homogeneous compared with those of the other amniotes, microchromosomal genes showed significantly higher GC3 than macrochromosomal genes as in chicken. Our snake cytogenetic map also identified several conserved segments between the snake macrochromosomes and the chicken microchromosomes. Cross-species comparisons revealed that GC3 of most snake orthologs in such macrochromosomal segments were GC-poor (GC3 〈 50%) whereas those of chicken orthologs in microchromosomes were relatively GC-rich (GC3 〉= 50%). Conclusion: Our results suggest that the chromosome size-dependent GC heterogeneity had already occurred before the lepidosaur-archosaur split, 275 million years ago. This character was probably present in the common ancestor of lepidosaurs and but lost in the lineage leading to Anolis during the diversification of lepidosaurs. We also identified several genes whose GC-content might have been influenced by the size of the chromosomes on which they were harbored over the course of sauropsid evolution.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2012-11-10
    Description: Background: Several methods have recently been developed to identify regions of the genome that have been exposed to strong selection. However, recent theoretical and empirical work suggests that polygenic models are required to identify the genomic regions that are more moderately responding to ongoing selection on complex traits. We examine the effects of multi-trait selection on the genome of a population of US registered Angus beef cattle born over a 50-year period representing approximately 10 generations of selection. We present results from the application of a quantitative genetic model, called Birth Date Selection Mapping, to identify signatures of recent ongoing selection. Results: We show that US Angus cattle have been systematically selected to alter their mean additive genetic merit for most of the 16 production traits routinely recorded by breeders. Using Birth Date Selection Mapping, we estimate the time-dependency of allele frequency for 44,817 SNP loci using genomic best linear unbiased prediction, generalized least squares, and BayesCpi analyses. Finally, we reconstruct the primary phenotypes that have historically been exposed to selection from a genome-wide analysis of the 16 production traits and gene ontology enrichment analysis. Conclusions: We demonstrate that Birth Date Selection Mapping utilizing mixed models corrects for time-dependent pedigree sampling effects that lead to spurious SNP associations and reveals genomic signatures of ongoing selection on complex traits. Because multiple traits have historically been selected in concert and most quantitative trait loci have small effects, selection has incrementally altered allele frequencies throughout the genome. Two quantitative trait loci of large effect were not the most strongly selected of the loci due to their antagonistic pleiotropic effects on strongly selected phenotypes. Birth Date Selection Mapping may readily be extended to temporally-stratified human or model organism populations.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2012-11-11
    Description: Background: Cultivated peanut or groundnut (Arachis hypogaea L.) is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). Both the low level of genetic variation within the cultivated gene pool and its polyploid nature limit the utilization of molecular markers to explore genome structure and facilitate genetic improvement. Nevertheless, a wealth of genetic diversity exists in diploid Arachis species (2n = 2x = 20), which represent a valuable gene pool for cultivated peanut improvement. Interspecific populations have been used widely for genetic mapping in diploid species of Arachis. However, an intraspecific mapping strategy was essential to detect chromosomal rearrangements among species that could be obscured by mapping in interspecific populations. To develop intraspecific reference linkage maps and gain insights into karyotypic evolution within the genus, we comparatively mapped the A- and B-genome diploid species using intraspecific F2 populations. Exploring genome organization among diploid peanut species by comparative mapping will enhance our understanding of the cultivated tetraploid peanut genome. Moreover, new sources of molecular markers that are highly transferable between species and developed from expressed genes will be required to construct saturated genetic maps for peanut. Results: A total of 2,138 EST-SSR (expressed sequence tag-simple sequence repeat) markers were developed by mining a tetraploid peanut EST assembly including 101,132 unigenes (37,916 contigs and 63,216 singletons) derived from 70,771 long-read (Sanger) and 270,957 short-read (454) sequences. A set of 97 SSR markers were also developed by mining 9,517 genomic survey sequences of Arachis. An SSR-based intraspecific linkage map was constructed using an F2 population derived from a cross between K 9484 (PI 298639) and GKBSPSc 30081 (PI 468327) in the B - genome species A. batizocoi. A high degree of macrosynteny was observed when comparing the homoeologous linkage groups between A (A. duranensis) and B (A. batizocoi) genomes. Comparison of the A - and B - genome genetic linkage maps also showed a total of five inversions and one major reciprocal translocation between two pairs of chromosomes under our current mapping resolution. Conclusions: Our findings will contribute to understanding tetraploid peanut genome origin and evolution and eventually promote its genetic improvement. The newly developed EST-SSR markers will enrich current molecular marker resources in peanut.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2012-11-11
    Description: Background: Empirical evaluations of sexually dimorphic expression of genes on the mammalian X-chromosome are needed to understand the evolutionary forces and the gene-regulatory mechanisms controlling this chromosome. We performed a large-scale sex-bias expression analysis of genes on the X-chromosome in six different somatic tissues from mouse. Results: Our results show that the mouse X-chromosome is enriched with female-biased genes and depleted of male-biased genes. This suggests that feminisation as well as de-masculinisation of the X-chromosome has occurred in terms of gene expression in non-reproductive tissues. Several mechanisms may be responsible for the control of female-biased expression on chromosome X, and escape from X-inactivation is a main candidate. We confirmed escape in case of Tmem29 using RNA-FISH analysis. In addition, we identified novel female-biased non-coding transcripts located in the same female-biased cluster as the well-known coding X-inactivation escapee Kdm5c, likely transcribed from the transition-region between active and silenced domains. We also found that previously known escapees only partially explained the overrepresentation of female-biased X-genes, particularly for tissue-specific female-biased genes. Therefore, the gene set we have identified contains tissue-specific escapees and/or genes controlled by other sexually skewed regulatory mechanisms. Analysis of gene age showed that evolutionarily old X-genes (〉0100 myr, preceding the radiation of placental mammals) are more frequently female-biased than younger genes. Conclusion: Altogether, our results have implications for understanding both gene regulation and gene evolution of mammalian X-chromosomes, and suggests that the final result in terms of the X-gene composition (masculinisation versus feminisation) is a compromise between different evolutionary forces acting on reproductive and somatic tissues.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2012-11-11
    Description: Background: The genomes of three major mosquito vectors of human diseases, Anopheles gambiae, Aedes aegypti, and Culex pipiens quinquefasciatus, have been previously sequenced. C. p. quinquefasciatus has the largest number of predicted protein-coding genes, which partially results from the expansion of three detoxification gene families: cytochrome P450 monooxygenases (P450), glutathione S-transferases (GST), and carboxyl/cholinesterases (CCE). However, unlike An. gambiae and Ae. aegypti, which have large amounts of gene expression data, C. p. quinquefasciatus has limited transcriptomic resources. Knowledge of complete gene expression information is very important for the exploration of the functions of genes involved in specific biological processes. In the present study, the three detoxification gene families of C. p. quinquefasciatus were analyzed for phylogenetic classification and compared with those of three other dipteran insects. Gene expression during various developmental stages and the differential expression responsible for parathion resistance were profiled using the digital gene expression (DGE) technique. Results: A total of 302 detoxification genes were found in C. p. quinquefasciatus, including 71 CCE, 196 P450, and 35 cytosolic GST genes. Compared with three other dipteran species, gene expansion in Culex mainly occurred in the CCE and P450 families, where the genes of alpha-esterases, juvenile hormone esterases, and CYP325 of the CYP4 subfamily showed the most pronounced expansion on the genome. For the five DGE libraries, 3.5-3.8 million raw tags were generated and mapped to 13314 reference genes. Among 302 detoxification genes, 225 (75%) were detected for expression in at least one DGE library. One fourth of the CCE and P450 genes were detected uniquely in one stage, indicating potential developmentally regulated expression. A total of 1511 genes showed different expression levels between a parathion-resistant and a susceptible strain. Fifteen detoxification genes, including 2 CCEs, 6 GSTs, and 7 P450s, were expressed at higher levels in the resistant strain. Conclusions: The results of the present study provide new insights into the functions and evolution of three detoxification gene families in mosquitoes and comprehensive transcriptomic resources for C. p. quinquefasciatus, which will facilitate the elucidation of molecular mechanisms underlying the different biological characteristics of the three major mosquito vectors.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2012-11-14
    Description: Background: Schistosoma mansoni is one of the causative agents of schistosomiasis, a neglected tropical disease that affects about 237 million people worldwide. Despite recent efforts, we still lack a general understanding of the relevant host-parasite interactions, and the possible treatments are limited by the emergence of resistant strains and the absence of a vaccine. The S. mansoni genome was completely sequenced and still under continuous annotation. Nevertheless, more than 45% of the encoded proteins remain without experimental characterization or even functional prediction. To improve our knowledge regarding the biology of this parasite, we conducted a proteome-wide evolutionary analysis to provide a broad view of the S. mansoni's proteome evolution and to improve its functional annotation. Results: Using a phylogenomic approach, we reconstructed the S. mansoni phylome, which comprises the evolutionary histories of all parasite proteins and their homologs across 12 other organisms. The analysis of a total of 7,964 phylogenies allowed a deeper understanding of genomic complexity and evolutionary adaptations to a parasitic lifestyle. In particular, the identification of lineage-specific gene duplications pointed to the diversification of several protein families that are relevant for host-parasite interaction, including proteases, tetraspanins, fucosyltransferases, venom allergen-like proteins, and tegumental-allergen-like proteins. In addition to the evolutionary knowledge, the phylome data enabled us to automatically re-annotate 3,451 proteins through a phylogenetic-based approach rather than solely sequence similarity searches. To allow further exploitation of this valuable data, all information has been made available at PhylomeDB (http://www.phylomedb.org). Conclusions: In this study, we used an evolutionary approach to assess S. mansoni parasite biology, improve genome/proteome functional annotation, and provide insights into host-parasite interactions. Taking advantage of a proteome-wide perspective rather than focusing on individual proteins, we identified that this parasite has experienced specific gene duplication events, particularly affecting genes that are potentially related to the parasitic lifestyle. These innovations may be related to the mechanisms that protect S. mansoni against host immune responses being important adaptations for the parasite survival in a potentially hostile environment. Continuing this work, a comparative analysis involving genomic, transcriptomic, and proteomic data from other helminth parasites, other parasites, and vectors will supply more information regarding parasite's biology as well as host-parasite interactions.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2012-11-14
    Description: Background: The genomic information which is transcribed into the primary RNA can be altered by RNA editing at the transcriptional or post-transcriptional level, which provides an effective way to create transcript diversity in an organism. Altering can occur through substitutional RNA editing or via the insertion or deletion of nucleotides relative to the original template. Taking advantage of recent high throughput sequencing technology combined with bioinformatics tools, several groups have recently studied the genome-wide substitutional RNA editing profiles in human. However, while insertional/deletional (indel) RNA editing is well known in several lower species, only very scarce evidence supports the existence of insertional editing events in higher organisms such as human, and no previous work has specifically focused on indel differences between RNA and their matching DNA in human. Here, we provide the first study to examine the possibility of genome-wide indel RNA-DNA differences in one human individual, NA12878, whose RNA and matching genome have been deeply sequenced. Results: We apply different computational tools that are capable of identifying indel differences between RNA reads and the matching reference genome and we initially find hundreds of such indel candidates. However, with careful further analysis and filtering, we conclude that all candidates are false-positives created by splice junctions, paralog sequences, diploid alleles, and known genomic indel variations. Conclusions: Overall, our study suggests that indel RNA editing events are unlikely to exist broadly in the human transcriptome and emphasizes the necessity of a robust computational filter pipeline to obtain high confidence RNA-DNA difference results when analyzing high throughput sequencing data as suggested in the recent genome-wide RNA editing studies.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2012-11-14
    Description: Background: The development of complex responses to hypoxia has played a key role in the evolution of mammals, as inadequate response to this condition is frequently associated with cardiovascular diseases, developmental disorders, and cancers. Though numerous studies have used mice and rats in order to explore mechanisms that contribute to hypoxia tolerance, these studies are limited due to the high sensitivity of most rodents to severe hypoxia. The blind subterranean mole rat Spalax is a hypoxia tolerant rodent, which exhibits unique longevity and therefore has invaluable potential in hypoxia and cancer research. Results: Using microarrays, transcript abundance was measured in brain and muscle tissues from Spalax and rat individuals exposed to acute and chronic hypoxia for varying durations. We found that Spalax global gene expression response to hypoxia differs from that of rat and is characterized by the activation of functional groups of genes that have not been strongly associated with the response to hypoxia in hypoxia sensitive mammals. Using functional enrichment analysis of Spalax hypoxia induced genes we found highly significant overrepresentation of groups of genes involved in anti apoptosis, cancer, embryonic/sexual development, epidermal growth factor receptor binding, coordinated suppression and activation of distinct groups of transcription factors and membrane receptors, in addition to angiogenic related processes. We also detected hypoxia induced increases of different critical Spalax hub gene transcripts, including antiangiogenic genes associated with cancer tolerance in Down syndrome human individuals. Conclusions: This is the most comprehensive study of Spalax large scale gene expression response to hypoxia to date, and the first to use custom Spalax microarrays. Our work presents novel patterns that may underlie mechanisms with critical importance to the evolution of hypoxia tolerance, with special relevance to medical research.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2012-11-15
    Description: Background: Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data,currently used Fourier series approximations of periodic gene expressions have been found not to be sufficientlyadequate to model the complexity of the time-course data, partly due to their ignoring the dependence between theexpression measurements over time and the correlation among gene expression profiles. We further investigatethe advantages and limitations of available models in the literature and propose a new mixture model withautoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Somesimulations and real examples are given to demonstrate the usefulness of the proposed models. Results: We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that ourmodel outperforms existing models to provide more reliable and robust clustering of time-course data. Our modelprovides superior results when genetic profiles are correlated. It also gives comparable results when the correlationbetween the gene profiles is weak. In the applications to real time-course data, relevant clusters of co-regulatedgenes are obtained, which are supported by gene-function annotation databases. Conclusions: Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clusteringtime-course data because it adopts a random effects model that allows for the correlation among observations atdifferent time points. It postulates gene-specific random effects with an auto-correlation variance structure thatmodels coregulation within the clusters The developed R package is flexible in its specification of the randomeffectsthrough user-input parameters that enables improved modelling and consequent clustering of time-coursedata.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2012-11-15
    Description: Background: Mitochondrial (mt) genomes vary considerably in size, structure and gene content. The mt genomes of the phylum Apicomplexa, which includes important human pathogens such as the malaria parasite Plasmodium, also show marked diversity of structure. Plasmodium has a concatenated linear mt genome of the smallest size (6-kb); Babesia and Theileria have a linear monomeric mt genome (6.5-kb to 8.2-kb) with terminal inverted repeats; Eimeria, which is distantly related to Plasmodium and Babesia/Theileria, possesses a mt genome (6.2-kb) with a concatemeric form similar to that of Plasmodium; Cryptosporidium, the earliest branching lineage within the phylum Apicomplexa, has no mt genome. We are interested in the evolutionary origin of linear mt genomes of Babesia/Theileria, and have investigated mt genome structures in members of archaeopiroplasmid, a lineage branched off earlier from Babesia/Theileria. Results: The complete mt genomes of archaepiroplasmid parasites, Babesia microti and Babesia rodhaini, were sequenced. The mt genomes of B. microti (11.1-kb) and B. rodhaini (6.9-kb) possess two pairs of unique inverted repeats, IR-A and IR-B. Flip-flop inversions between two IR-As and between two IR-Bs appear to generate four distinct genome structures that are present at an equi-molar ratio. An individual parasite contained multiple mt genome structures, with 20 copies and 2 - 3 copies per haploid nuclear genome in B. microti and B. rodhaini, respectively. Conclusion: We found a novel linear monomeric mt genome structure of B. microti and B. rhodhaini equipped with dual flip-flop inversion system, by which four distinct genome structures are readily generated. To our knowledge, this study is the first to report the presence of two pairs of distinct IR sequences within a monomeric linear mt genome. The present finding provides insight into further understanding of evolution of mt genome structure.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2012-11-15
    Description: Background: Leishmania major, a protozoan parasite, is the causative agent of cutaneous leishmaniasis. Due to the development of resistance against the currently available anti-leishmanial drugs, there is a growing need for specific inhibitors and novel drug targets. In this regards, aminoacyl tRNA synthetases, the linchpins of protein synthesis, have received recent attention among the kinetoplastid research community. This is the first comprehensive survey of the aminoacyl tRNA synthetases, their paralogs and other associated proteins from L. major. Results: A total of 26 aminoacyl tRNA synthetases were identified using various computational and bioinformatics tools. Phylogenetic analysis and domain architectures of the L. major aminoacyl tRNA synthetases suggest a probable archaeal/eukaryotic origin. Presence of additional domains or N- or C-terminal extensions in 11 aminoacyl tRNA synthetases from L .major suggests possibilities such as additional tRNA binding or oligomerization or editing activity. Five freestanding editing domains were identified in L. major. Domain assignment revealed a novel asparagine tRNA synthetase paralog, asparagine synthetase A which has been so far reported from prokaryotes and archaea. Conclusions: A comprehensive bioinformatic analysis revealed 26 aminoacyl tRNA synthetases and five freestanding editing domains in L. major. Identification of two EMAP (endothelial monocyte-activating polypeptide) II-like proteins similar to human EMAP II-like proteins suggests their participation in multisynthetase complex formation. While the phylogeny of tRNA synthetases suggests a probable archaeal/eukaryotic origin, phylogeny of asparagine synthetase A strongly suggests a bacterial origin. The unique features identified in this work provide rationale for designing inhibitors against parasite aminoacyl tRNA synthetases and their paralogs.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2012-11-15
    Description: Background: To balance the demand for uptake of essential elements with their potential toxicity living cells have complex regulatory mechanisms. Here, we describe a genome-wide screen to identify genes that impact the elemental composition ('ionome') of yeast Saccharomyces cerevisiae. Using inductively coupled plasma -- mass spectrometry (ICP-MS) we quantify Ca, Cd, Co, Cu, Fe, K, Mg, Mn, Mo, Na, Ni, P, S and Zn in 11890 mutant strains, including 4940 haploid and 1127 diploid deletion strains, and 5798 over expression strains. Results: We identified 1065 strains with an altered ionome, including 584 haploid and 35 diploid deletion strains, and 446 over expression strains. Disruption of protein metabolism or trafficking has the highest likelihood of causing large ionomic changes, with gene dosage also being important. Gene over expression produced more extreme ionomic changes, but over expression and loss of function phenotypes are generally not related. Ionomic clustering revealed the existence of only a small number of possible ionomic profiles suggesting fitness tradeoffs that constrain the ionome. Clustering also identified important roles for the mitochondria, vacuole and ESCRT pathway in regulation of the ionome. Network analysis identified hub genes such as PMR1 in Mn homeostasis, novel members of ionomic networks such as SMF3 in vacuolar retrieval of Mn, and cross-talk between the mitochondria and the vacuole. All yeast ionomic data can be searched and downloaded at www.ionomicshub.org. Conclusions: Here, we demonstrate the power of high-throughput ICP-MS analysis to functionally dissect the ionome on a genome-wide scale. The information this reveals has the potential to benefit both human health and agriculture.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2012-11-15
    Description: Background: MicroRNA (miRNA) directed gene repression is an important mechanism of posttranscriptional regulation. Comprehensive analyses of how microRNA influence biological processes requires paired miRNA-mRNA expression datasets. However, a review of both GEO and ArrayExpress repositories revealed few such datasets, which was in stark contrast to the large number of messenger RNA (mRNA) only datasets. It is of interest that numerous primary miRNAs (precursors of microRNA) are known to be co-expressed with coding genes (host genes). Results: We developed a miRNA-mRNA interaction analyses pipeline. The proposed solution is based on two miRNA expression prediction methods -- a scaling function and a linear model. Additionally, miRNA-mRNA anti-correlation analyses are used to determine the most probable miRNA gene targets (i.e. the differentially expressed genes under the influence of up- or down-regulated microRNA). Both the consistency and accuracy of the prediction method is ensured by the application of stringent statistical methods. Finally, the predicted targets are subjected to functional enrichment analyses including GO, KEGG and DO, to better understand the predicted interactions. Conclusions: The MMpred pipeline requires only mRNA expression data as input and is independent of third party miRNA target prediction methods. The method passed extensive numerical validation based on the binding energy between the mature miRNA and 3' UTR region of the target gene. We report that MMpred is capable of generating results similar to that obtained using paired datasets. For the reported test cases we generated consistent output and predicted biological relationships that will help formulate further testable hypotheses.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2012-11-16
    Description: Background: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2012-11-16
    Description: Background: Pantoea spp. are frequently isolated from a wide range of ecological niches and have various biological roles, as plant epi- or endophytes, biocontrol agents, plant-growth promoters or as pathogens of both plant and animal hosts. This suggests that members of this genus have undergone extensive genotypic diversification. One means by which this occurs among bacteria is through the acquisition and maintenance of plasmids. Here, we have analyzed and compared the sequences of a large plasmid common to all sequenced Pantoea spp.Results and discussionThe Large Pantoea Plasmids (LPP-1) of twenty strains encompassing seven different Pantoea species, including pathogens and endo-/epiphytes of a wide range of plant hosts as well as insect-associated strains, were compared. The LPP-1 plasmid sequences range in size from ~281 to 794 kb and carry between 238 and 750 protein coding sequences (CDS). A core set of 46 proteins, encompassing 2.2% of the total pan-plasmid (2,095 CDS), conserved among all LPP-1 plasmid sequences, includes those required for thiamine and pigment biosynthesis. Phylogenetic analysis reveals that these plasmids have arisen from an ancestral plasmid, which has undergone extensive diversification. Analysis of the proteins encoded on LPP-1 also showed that these plasmids contribute to a wide range of Pantoea phenotypes, including the transport and catabolism of various substrates, inorganic ion assimilation, resistance to antibiotics and heavy metals, colonization and persistence in the host and environment, pathogenesis and antibiosis. Conclusions: LPP-1 is universal to all Pantoea spp. whose genomes have been sequenced to date and is derived from an ancestral plasmid. LPP-1 encodes a large array of proteins that have played a major role in the adaptation of the different Pantoea spp. to their various ecological niches and their specialization as pathogens, biocontrol agents or benign saprophytes found in many diverse environments.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2012-11-16
    Description: Background: The 2009 pandemic H1N1 influenza virus emerged in swine and quickly became a major global health threat. In mouse, non-human primate, and swine infection models, the pH1N1 virus efficiently replicates in the lung and induces pro-inflammatory host responses; however, whether similar or different cellular pathways were impacted by pH1N1 virus across independent infection models remains to be further defined. To address this we have performed a comparative transcriptomic analysis of acute phase responses to a single pH1N1 influenza virus, A/California/04/2009 (CA04), in the lung of mice, macaques and swine. Results: Despite similarities in the clinical course, we observed differences in inflammatory molecules elicited, and the kinetics of their gene expression changes across all three species. We found genes associated with the retinoid X receptor (RXR) signaling pathway known to control pro-inflammatory and metabolic processes that were differentially regulated during infection in each species, though the heterodimeric RXR partner, pathway associated signaling molecules, and gene expression patterns varied among the three species. Conclusions: By comparing transcriptional changes in the context of clinical and virological measures, we identified differences in the host transcriptional response to pH1N1 virus across independent models of acute infection. Antiviral resistance and the emergence of new influenza viruses have placed more focus on developing drugs that target the immune system. Underlying overt clinical disease are molecular events that suggest therapeutic targets identified in one host may not be appropriate in another.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2012-11-16
    Description: Background: The pig is a biomedical model to study human and livestock traits. Many of these traits are controlled by neuropeptides that result from the cleavage of prohormones by prohormone convertases. Only 45 prohormones have been confirmed in the pig. Sequence homology can be ineffective to annotate prohormone genes in sequenced species like the pig due to the multifactorial nature of the prohormone processing. The goal of this study is to undertake the first complete survey of prohormone and prohormone convertases genes in the pig genome. These genes were functionally annotated based on 35 gene expression microarray experiments. The cleavage sites of prohormone sequences into potentially active neuropeptides were predicted. Results: We identified 95 unique prohormone genes, 2 alternative calcitonin-related sequences, 8 prohormone convertases and 1 cleavage facilitator in the pig genome 10.2 assembly and trace archives. Of these, 11 pig prohormone genes have not been reported in the UniProt, UniGene or Gene databases. These genes are intermedin, cortistatin, insulin-like 5, orexigenic neuropeptide QRFP, prokineticin 2, prolactin-releasing peptide, parathyroid hormone 2, urocortin, urocortin 2, urocortin 3, and urotensin 2-related peptide. In addition, a novel neuropeptide S was identified in the pig genome correcting the previously reported pig sequence that is identical to the rabbit sequence. Most differentially expressed prohormone genes were under-expressed in pigs experiencing immune challenge relative to the un-challenged controls, in non-pregnant relative to pregnant sows, in old relative to young embryos, and in non-neural relative to neural tissues. The cleavage prediction based on human sequences had the best performance with a correct classification rate of cleaved and non-cleaved sites of 92% suggesting that the processing of prohormones in pigs is similar to humans. The cleavage prediction models did not find conclusive evidence supporting the production of the bioactive neuropeptides urocortin 2, urocortin 3, torsin family 2 member A, tachykinin 4, islet amyloid polypeptide, and calcitonin receptor-stimulating peptide 2 in the pig. Conclusions: The present genomic and functional characterization supports the use of the pig as an effective animal model to gain a deeper understanding of prohormones, prohormone convertases and neuropeptides in biomedical and agricultural research.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2012-11-16
    Description: Background: Understanding the causes underlying heterogeneity of molecular evolutionary rates among lineages is a long-standing and central question in evolutionary biology. Although several earlier studies showed that modern frogs (Neobatrachia) experienced an acceleration of mitochondrial gene substitution rates compared to non-neobatrachian relatives, no further characterization of this phenomenon was attempted. To gain new insights on this topic, we sequenced the complete mitochondrial genomes and nine nuclear loci of one pelobatoid (Pelodytes punctatus) and five neobatrachians, Heleophryne regis (Heleophrynidae), Lechriodus melanopyga (Limnodynastidae), Calyptocephalella gayi (Calyptocephalellidae), Telmatobius bolivianus (Ceratophryidae), and Sooglossus thomasseti (Sooglossidae). These represent major clades not included in previous mitogenomic analyses, and most of them are remarkably species-poor compared to other neobatrachians. Results: We reconstructed a fully resolved and robust phylogeny of extant frogs based on the new mitochondrial and nuclear sequence data, and dated major cladogenetic events. The reconstructed tree recovered Heleophryne as sister group to all other neobatrachians, the Australasian Lechriodus and the South American Calyptocephalella formed a clade that was the sister group to Nobleobatrachia, and the Seychellois Sooglossus was recovered as the sister group of Ranoides. We used relative-rate tests and direct comparison of branch lengths from mitochondrial and nuclear-based trees to demonstrate that both mitochondrial and nuclear evolutionary rates are significantly higher in all neobatrachians compared to their non-neobatrachian relatives, and that such rate acceleration started at the origin of Neobatrachia. Conclusions: Through the analysis of the selection coefficient (omega) in different branches of the tree, we found compelling evidence of relaxation of purifying selection in neobatrachians, which could (at least in part) explain the observed higher mitochondrial and nuclear substitution rates in this clade. Our analyses allowed us to discard that changes in substitution rates could be correlated with increased mitochondrial genome rearrangement or diversification rates observed in different lineages of neobatrachians.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2012-11-16
    Description: Background: Carcass fatness is an important trait in most pig breeding programs. Following market requests, breeding plans for fresh pork consumption are usually designed to reduce carcass fat content and increase lean meat deposition. However, the Italian pig industry is mainly devoted to the production of Protected Designation of Origin dry cured hams: pigs are slaughtered at around 160 kg of live weight and the breeding goal aims at maintaining fat coverage, measured as backfat thickness to avoid excessive desiccation of the hams. This objective has shaped the genetic pool of Italian heavy pig breeds for a few decades. In this study we applied a selective genotyping approach within a population of ~ 12,000 performance tested Italian Large White pigs. Within this population, we selectively genotyped 304 pigs with extreme and divergent backfat thickness estimated breeding value by the Illumina PorcineSNP60 BeadChip and performed a genome wide association study to identify loci associated to this trait. Results: We identified 4 single nucleotide polymorphisms with P≤5.0E-07 and additional 119 ones with 5.0E-07
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2012-11-16
    Description: Background: Along with the draft sequencing of the pig genome, which has been completed by an international consortium, collection of the nucleotide sequences of genes expressed in various tissues and determination of entire cDNA sequences are necessary for investigations of gene function. The sequences of expressed genes are also useful for genome annotation, which is important for isolating the genes responsible for particular traits. Results: We performed a large-scale expressed sequence tag (EST) analysis in pigs by using 32 full-length-enriched cDNA libraries derived from 28 kinds of tissues and cells, including seven tissues (brain, cerebellum, colon, hypothalamus, inguinal lymph node, ovary, and spleen) derived from pigs that were cloned from a sow subjected to genome sequencing. We obtained more than 330,000 EST reads from the 5′-ends of the cDNA clones. Comparison with human and bovine gene catalogs revealed that the ESTs corresponded to at least 15,000 genes. cDNA clones representing contigs and singlets generated by assembly of the EST reads were subjected to full-length determination of inserts. We have finished sequencing 31,079 cDNA clones corresponding to more than 12,000 genes. Mapping of the sequences of these cDNA clones on the draft sequence of the pig genome has indicated that the clones are derived from about 15,000 independent loci on the pig genome. Conclusions: ESTs and cDNA sequences derived from full-length-enriched libraries are valuable for annotation of the draft sequence of the pig genome. This information will also contribute to the exploration of promoter sequences on the genome and to molecular biology-based analyses in pigs.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2012-11-16
    Description: Background: The release of the porcine genome sequence offers great perspectives for Pig genetics and genomics, and more generally will contribute to the understanding of mammalian genome biology and evolution. The process of producing a complete genome sequence of high quality, while facilitated by high-throughput sequencing technologies, remains a difficult task. The porcine genome was sequenced using a combination of a hierarchical shotgun strategy and data generated with whole genome shotgun. In addition to the BAC contig map used for the clone-by-clone approach, genomic mapping resources for the pig include two radiation hybrid (RH) panels at two different resolutions. These two panels have been used extensively for the physical mapping of pig genes and markers prior to the availability of the pig genome sequence. Results: In order to contribute to the assembly of the pig genome, we genotyped the two radiation hybrid (RH) panels with a SNP array (the Illumina porcineSNP60 array) and produced high density physical RH maps for each pig autosome. We first present the methods developed to obtain high density RH maps with 38,379 SNPs from the SNP array genotyping. We then show how they were useful to identify problems in a draft of the pig genome assembly, and how the RH maps enabled the problems to be corrected in the porcine genome sequence. Finally, we used the RH maps to predict the position of 2,703 SNPs and 1,328 scaffolds currently unplaced on the porcine genome assembly. Conclusions: A complete process, from genotyping of a high density SNP array on RH panels, to the construction of genome-wide high density RH maps, and finally their exploitation for validating and improving a genome assembly is presented here. The study includes the cross-validation of RH based findings with independent information from genetic data and comparative mapping with the Human genome. Several additional resources are also provided, in particular the predicted genomic location of currently unplaced SNPs and associated scaffolds summing up to a total of 72 megabases, that can be useful for the exploitation of the pig genome assembly.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2012-11-16
    Description: Background: The application of DNA markers for the identification of biological samples from both human and non-human species is widespread and includes use in food authentication. In the food industry the financial incentive to substituting the true name of a food product with a higher value alternative is driving food fraud. This applies to British pork products where products derived from traditional pig breeds are of premium value. The objective of this study was to develop a genetic assay for regulatory authentication of traditional pig breed-labelled products in the porcine food industry in the United Kingdom. Results: The dataset comprised of a comprehensive coverage of breed types present in Britain: 460 individuals from 7 traditional breeds, 5 commercial purebreds, 1 imported European breed and 1 imported Asian breed were genotyped using the PorcineSNP60 beadchip. Following breed-informative SNP selection, assignment power was calculated for increasing SNP panel size. A 96-plex assay created using the most informative SNPs revealed remarkably high genetic differentiation between the British pig breeds, with an average FST of 0.54 and Bayesian clustering analysis also indicated that they were distinct homogenous populations. The posterior probability of assignment of any individual of a presumed origin actually originating from that breed given an alternative breed origin was 〉 99.5% in 174 out of 182 contrasts, at a test value of log(LR) 〉 0. Validation of the 96-plex assay using independent test samples of known origin was successful; a subsequent survey of market samples revealed a high level of breed label conformity. Conclusion: The newly created 96-plex assay using selected markers from the PorcineSNP60 beadchip enables powerful assignment of samples to traditional breed origin and can effectively identify mislabelling, providing a highly effective tool for DNA analysis in food forensics.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2012-11-16
    Description: Background: Insects and animals can recognize surrounding environments by detecting thousands of chemical odorants. Olfaction is a complicated process that begins in the olfactory epithelium with the specific binding of volatile odorant molecules to dedicated olfactory receptors (ORs). OR proteins are encoded by the largest gene superfamily in the mammalian genome. Results: We report here the whole genome analysis of the olfactory receptor genes of S. scrofa using conserved OR gene specific motifs and known OR protein sequences from diverse species. We identified 1,301 OR related sequences from the S. scrofa genome assembly, Sscrofa10.2, including 1,113 functional OR genes and 188 pseudogenes. OR genes were located in 46 different regions on 16 pig chromosomes. We classified the ORs into 17 families, three Class I and 14 Class II families, and further grouped them into 349 subfamilies. We also identified inter- and intra-chromosomal duplications of OR genes residing on 11 chromosomes. A significant number of pig OR genes (n = 212) showed less than 60% amino acid sequence similarity to known OR genes of other species. Conclusion: As the genome assembly Sscrofa10.2 covers 99.9% of the pig genome, our analysis represents an almost complete OR gene repertoire from an individual pig genome. We show that S. scrofa has one of the largest OR repertoires, suggesting an expansion of OR genes in the swine genome. A significant number of unique OR genes in the pig genome may suggest the presence of swine specific olfactory stimulation.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2012-12-09
    Description: Background: Multivariate approaches have been successfully applied to genome wide association studies. Recently, a Partial Least Squares (PLS) based approach was introduced for mapping yeast genotype-phenotype relations, where background information such as gene function classification, gene dispensability, recent or ancient gene copy number variations and the presence of premature stop codons or frameshift mutations in reading frames, were used post hoc to explain selected genes. One of the latest advancement in PLS named L-Partial Least Squares (L-PLS), where 'L' presents the used data structure, enables the use of background information at the modeling level. Here, a modification of L-PLS with variable importance on projection (VIP) was implemented using a stepwise regularized procedure for gene and background information selection. Results werecompared to PLS-based procedures, where no background information was used. Results: Applying the proposed methodology to yeast Saccharomyces cerevisiae data, we found the relationship between genotype-phenotype to have improved understandability. Phenotypic variations were explained by the variations of relatively stable genes and stable background variations. The suggested procedure provides an automatic way for genotype-phenotype mapping. The selected phenotype influencing genes were evolving 29% faster than non-influential genes, and the current results are supported by a recently conducted study. Further power analysis on simulated data verified that the proposed methodology selects relevant variables. Conclusions: A modification of L-PLS with VIP in a stepwise regularized elimination procedure can improve the understandability and stability of selected genes and background information. The approach is recommended for genome wide association studies where background information is available.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2012-12-09
    Description: Background: Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results: The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion: Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2012-12-10
    Description: Background: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). Results: We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. Conclusions: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2012-12-11
    Description: Background: Batch effect is one type of variability that is not of primary interest but ubiquitous in sizable genomic experiments. To minimize the impact of batch effects, an ideal experiment design should ensure the even distribution of biological groups and confounding factors across batches. However, due to the practical complications, the availability of the final collection of samples in genomics study might be unbalanced and incomplete, which, without appropriate attention in sample-to-batch allocation, could lead to drastic batch effects. Therefore, it is necessary to develop effective and handy tool to assign collected samples across batches in an appropriate way in order to minimize batch effects. Results: We describe OSAT (Optimal Sample Assignment Tool), a bioconductor package designed for automated sample-to-batch allocations in genomics experiments. Conclusions: OSAT is developed to facilitate the allocation of collected samples to different batches in genomics study. Through optimizing the even distribution of samples in groups of biological interest into different batches, it can reduce the confounding or correlation between batches and the biological variables of interest. It can also optimize the homogeneous distribution of confounding factors across batches. It can handle challenging instances where incomplete and unbalanced sample collections are involved as well as ideally balanced designs.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2012-12-11
    Description: Background: The bacterium Pelobacter carbinolicus is able to grow by fermentation, syntrophic hydrogen/formate transfer, or electron transfer to sulfur from short-chain alcohols, hydrogen or formate; it does not oxidize acetate and is not known to ferment any sugars or grow autotrophically. The genome of P. carbinolicus was sequenced in order to understand its metabolic capabilities and physiological features in comparison with its relatives, acetate-oxidizing Geobacter species. Results: Pathways were predicted for catabolism of known substrates: 2,3-butanediol, acetoin, glycerol, 1,2-ethanediol, ethanolamine, choline and ethanol. Multiple isozymes of 2,3-butanediol dehydrogenase, ATP synthase and [FeFe]-hydrogenase were differentiated and assigned roles according to their structural properties and genomic contexts. The absence of asparagine synthetase and the presence of a mutant tRNA for asparagine encoded among RNA-active enzymes suggest that P. carbinolicus may make asparaginyl-tRNA in a novel way. Catabolic glutamate dehydrogenases were discovered, implying that the tricarboxylic acid (TCA) cycle can function catabolically. A phosphotransferase system for uptake of sugars was discovered, along with enzymes that function in 2,3-butanediol production. Pyruvate:ferredoxin/flavodoxin oxidoreductase was identified as a potential bottleneck in both the supply of oxaloacetate for oxidation of acetate by the TCA cycle and the connection of glycolysis to production of ethanol. The P. carbinolicus genome was found to encode autotransporters and various appendages, including three proteins with similarity to the geopilin of electroconductive nanowires. Conclusions: Several surprising metabolic capabilities and physiological features were predicted from the genome of P. carbinolicus, suggesting that it is more versatile than anticipated.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2012-12-11
    Description: Background: Brown planthopper (BPH), Nilaparvata lugens Stal, is one of the most destructive insect pests of rice. The molecular responses of plants to sucking insects resemble responses to pathogen infection. However, the molecular mechanism of BPH-resistance in rice remains unclear. Transcription factors (TF) are up-stream regulators of various genes that bind to specific DNA sequences, thereby controlling the transcription from DNA to mRNA. They are key regulators for transcriptional expression in biological processes, and are probably involved in the BPH-induced pathways in resistant rice varieties. Results: We conducted a microarray experiment to analyze TF genes related to BPH resistance in a Sri Lankan rice cultivar, Rathu Heenati (RHT). We compared the expression profiles of TF genes in RHT with those of the susceptible rice cultivar Taichun Native 1 (TN1). We detected 2038 TF genes showing differential expression signals between the two rice varieties. Of these, 442 TF genes were probably related to BPH-induced resistance in RHT and TN1, and 229 may be related to constitutive resistance only in RHT. These genes showed a fold change (FC) of more than 2.0 (P 10, there were 37 induced TF genes and 26 constitutive resistance TF genes. Of these, 13 were probably involved in BPH-induced resistance, and 8 in constitutive resistance to BPH in RHT. Conclusions: We explored the molecular mechanism of resistance to BPH in rice by comparing expressions of TF genes between RHT and TN1. We speculate that the level of gene repression, especially for early TF genes, plays an important role in the defense response. The fundamental point of the resistance strategy is that plants protect themselves by reducing their metabolic level to inhibit feeding by BPH and prevent damage from water and nutrient loss. We have selected 21 TF genes related to BPH resistance for further analyses to understand the molecular responses to BPH feeding in rice.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2012-12-11
    Description: Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2012-12-12
    Description: Background: Illumina BeadArray technology includes non specific negative control features that allow a precise estimation of the background noise. As an alternative to the background subtraction proposed in BeadStudio which leads to an important loss of information by generating negative values, a background correction method modeling the observed intensities as the sum of the exponentially distributed signal and normally distributed noise has been developed. Nevertheless, Wang and Ye (2012) display a kernel-based estimator of the signal distribution on Illumina BeadArrays and suggest that a gamma distribution would represent a better modeling of the signal density. Hence, the normal-exponential modeling may not be appropriate for Illumina data and background corrections derived from this model may lead to wrong estimation. Results: We propose a more flexible modeling based on a gamma distributed signal and a normal distributed background noise and develop the associated background correction, implemented in the R-package NormalGamma. Our model proves to be markedly more accurate to model Illumina BeadArrays: on the one hand, it is shown on two types of Illumina BeadChips that this model offers a more correct fit of the observed intensities. On the other hand, the comparison of the operating characteristics of several background correction procedures on spike-in and on normal-gamma simulated data shows high similarities, reinforcing the validation of the normal-gamma modeling. The performance of the background corrections based on the normal-gamma and normal-exponential models are compared on two dilution data sets, through testing procedures which represent various experimental designs. Surprisingly, we observe that the implementation of a more accurate parametrisation in the model-based background correction does not increase the sensitivity. These results may be explained by the operating characteristics of the estimators: the normal-gamma background correction offers an improvement in terms of bias, but at the cost of a loss in precision. Conclusions: This paper addresses the lack of fit of the usual normal-exponential model by proposing a more flexible parametrisation of the signal distribution as well as the associated background correction. This new model proves to be considerably more accurate for Illumina microarrays, but the improvement in terms of modeling does not lead to a higher sensitivity in differential analysis. Nevertheless, this realistic modeling makes way for future investigations, in particular to examine the characteristics of pre-processing strategies.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2012-12-12
    Description: Background: Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered qnr genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, qnr genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of qnr genes in more detail. Results: In this paper we describe a new method to identify qnr genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of qnr genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated qnr genes. In addition, several fragments from novel putative qnr genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. Conclusions: The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of qnr genes in nucleotide sequence data. The predicted novel putative qnr genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at http://bioinformatics.math.chalmers.se/qnr/.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2012-12-12
    Description: Background: Protein-coding regions in human genes harbor 85% of the mutations that are associated with disease-related traits. Compared with whole-genome sequencing of complex samples, exome sequencing serves as an alternative option because of its dramatically reduced cost. In fact, exome sequencing has been successfully applied to identify the cause of several Mendelian disorders, such as Miller and Schinzel-Giedio syndrome. However, there remain great challenges in handling the huge data generated by exome sequencing and in identifying potential disease-related genetic variations. Results: In this study, Exome-assistant (http://122.228.158.106/exomeassistant), a convenient tool for submitting and annotating single nucleotide polymorphisms (SNPs) and insertion/deletion variations (InDels), was developed to rapidly detect candidate disease-related genetic variations from exome sequencing projects. Versatile filter criteria are provided by Exome-assistant to meet different users' requirements. Exome-assistant consists of four modules: the single case module, the two cases module, the multiple cases module, and the reanalysis module. The two cases and multiple cases modules allow users to identify sample-specific and common variations. The multiple cases module also supports family-based studies and Mendelian filtering. The identified candidate disease-related genetic variations can be annotated according to their sample features. Conclusions: In summary, by exploring exome sequencing data, Exome-assistant can provide researchers with detailed biological insights into genetic variation events and permits the identification of potential genetic causes of human diseases and related traits.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2012-12-12
    Description: Background: Protein effectors of pathogenicity are instrumental in modulating host immunity and disease resistance. The powdery mildew pathogen of grasses, Blumeria graminis, causes one of the most important diseases of cereal crops. B. graminis is an obligate biotrophic pathogen and as such has an absolute requirement to suppress or avoid host immunity to survive and cause disease. Results: Here we characterise a superfamily predicted to be the full complement of Candidates for Secreted Effector Proteins (CSEPs) in the fungal barley powdery mildew parasite, B. graminis f.sp. hordei. The 491 genes encoding these proteins constitute over 7% of this pathogen's annotated genes and most were grouped into 72 families of up to 59 members. They were predominantly expressed in the intracellular feeding structures, called haustoria, and proteins specifically associated with haustoria were identified by large-scale mass spectrometry-based proteomics. There are two major types of effector families: one comprises shorter proteins (100--150 amino acids), with a high relative expression level in the haustoria and evidence of extensive diversifying selection between paralogs; the second type consists of longer proteins (300--400 amino acids), with lower levels of differential expression and evidence of purifying selection between paralogs. An analysis of the predicted protein structures reveals polypeptide features that are similar to those of known fungal effectors, but also highlights unexpected structural affinities to ribonucleases throughout the entire effector superfamily. Candidate effector genes belonging to the same family are loosely clustered in the genome and are associated with repetitive DNA derived from retro-transposons. Conclusions: We employed the full complement of genomic, transcriptomic and proteomic analyses as well as structural prediction methods to identify and characterise the members of the CSEP superfamily in B. graminis f.sp. hordei. Based on relative intron position and the distribution of CSEPs with a ribonuclease-like domain in the phylogenetic tree, we hypothesize that the associated genes originated from an ancestral gene, encoding a secreted ribonuclease, duplicated successively by repetitive DNA-driven processes and diversified during the evolution of the grass and cereal powdery mildew lineage.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2012-12-12
    Description: Background: Genotyping and massively-parallel sequencing projects result in a vast amount of diploid data that is only rarely resolved into its constituent haplotypes. It is nevertheless this phased information that is transmitted from one generation to the next and is most directly associated with biological function and the genetic causes of biological effects. Despite progress made in genome-wide sequencing and phasing algorithms and methods, problems assembling (and reconstructing linear haplotypes in) regions of repetitive DNA and structural variation remain. These dynamic and structurally complex regions are often poorly understood from a sequence point of view. Regions such as these that are highly similar in their sequence tend to be collapsed onto the genome assembly. This is turn means downstream determination of the true sequence haplotype in these regions poses a particular challenge. For structurally complex regions, a more focussed approach to assembling haplotypes may be required. Results: In order to investigate reconstruction of spatial information at structurally complex regions, we have used an emulsion haplotype fusion PCR approach to reproducibly link sequences of up to 1kb in length to allow phasing of multiple variants from neighbouring loci, using allele-specific PCR and sequencing to detect the phase. By using emulsion systems linking flanking regions to amplicons within the CNV, this led to the reconstruction of a 59kb haplotype across the DEFA1A3 CNV in HapMap individuals. Conclusion: This study has demonstrated a novel use for emulsion haplotype fusion PCR in addressing the issue of reconstructing structural haplotypes at multiallelic copy variable regions, using the DEFA1A3 locus as an example.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2012-12-12
    Description: Background: Vitis vinifera berry development is characterised by an initial phase where the fruit is small, hard and acidic, followed by a lag phase known as veraison. In the final phase, berries become larger, softer and sweeter and accumulate an array of organoleptic compounds. Since the physiological and biochemical makeup of grape berries at harvest has a profound impact on the characteristics of wine, there is great interest in characterising the molecular and biophysical changes that occur from flowering through veraison and ripening, including the coordination and temporal regulation of metabolic gene pathways. Advances in deep-sequencing technologies, combined with the availability of increasingly accurate V. vinifera genomic and transcriptomic data, have enabled us to carry out RNA-transcript expression analysis on a global scale at key points during berry development. Results: A total of 162 million 100-base pair reads were generated from pooled Vitis vinifera (cv. Shiraz) berries sampled at 3-weeks post-anthesis, 10- and 11-weeks post-anthesis (corresponding to early and late veraison) and at 17-weeks post-anthesis (harvest). Mapping reads from each developmental stage (36-45 million) onto the NCBI RefSeq transcriptome of 23,720 V. vinifera mRNAs revealed that at least 75% of these transcripts were detected in each sample. RNA-Seq analysis uncovered 4,185 transcripts that were significantly upregulated at a single developmental stage, including 161 transcription factors. Clustering transcripts according to distinct patterns of transcription revealed coordination in metabolic pathways such as organic acid, stilbene and terpenoid metabolism. From the phenylpropanoid/stilbene biosynthetic pathway at least 46 transcripts were upregulated in ripe berries when compared to veraison and immature berries, and 12 terpene synthases were predominantly detected only in a single sample. Quantitative real-time PCR was used to validate the expression pattern of 12 differentially expressed genes from primary and secondary metabolic pathways. Conclusions: In this study we report the global transcriptional profile of Shiraz grapes at key stages of development. We have undertaken a comprehensive analysis of gene families contributing to commercially important berry characteristics and present examples of co-regulation and differential gene expression. The data reported here will provide an invaluable resource for the on-going molecular investigation of wine grapes.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2012-12-15
    Description: Background: High-throughput re-sequencing is rapidly becoming the method of choice for studies of neutral and adaptive processes in natural populations across taxa. As re-sequencing the genome of large numbers of samples is still cost-prohibitive in many cases, methods for genome complexity reduction have been developed in attempts to capture most ecologically-relevant genetic variation. One of these approaches is sequence capture, in which oligonucleotide baits specific to genomic regions of interest are synthesized and used to retrieve and sequence those regions. Results: We used sequence capture to re-sequence most predicted exons, their upstream regulatory regions, as well as numerous random genomic intervals in a panel of 48 genotypes of the angiosperm tree Populus trichocarpa (black cottonwood, or 'poplar'). A total of 20.76Mb (5%) of the poplar genome was targeted, corresponding to 173,040 baits. With 12 indexed samples run in each of four lanes on an Illumina HiSeq instrument (2x100 paired-end), 86.8% of the bait regions were on average sequenced at a depth 〉=10X. Few off-target regions (〉250bp away from any bait) were present in the data, but on average ~80bp on either side of the baits were captured and sequenced to an acceptable depth (〉=10X) to call heterozygous SNPs. Nucleotide diversity estimates within and adjacent to protein-coding genes were similar to those previously reported in Populus spp., while intergenic regions had higher values consistent with a relaxation of selection. Conclusions: Our results illustrate the efficiency and utility of sequence capture for re-sequencing highly heterozygous tree genomes, and suggest design considerations to optimize the use of baits in future studies.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2012-12-16
    Description: Background: Genomic regions controlling abdominal fatness (AF) were studied in the Northeast Agricultural University broiler line divergently selected for AF. In this study, the chicken 60KSNP chip and extended haplotype homozygosity (EHH) test were used to detect genome-wide signatures of AF. Results: A total of 5357 and 5593 core regions were detected in the lean and fat lines, and 51 and 57 reached a significant level (P
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2012-12-18
    Description: Background: Rye is an important European crop used for food, feed, and bioenergy. Several quality and yield-related traits are of agronomic relevance for rye breeding programs. Profound knowledge of the genetic architecture of these traits is needed to successfully implement marker-assisted selection programs. Nevertheless, little is known on quantitative loci underlying important agronomic traits in rye. Results: We used 440 F3:4 inbred lines from two biparental populations (Pop-A, Pop-B) fingerprinted with about 800 to 900 SNP, SSR and/or DArT markers and outcrossed them to a tester for phenotyping. The resulting hybrids and their parents were evaluated for grain yield, single-ear weight, test weight, plant height, thousand-kernel weight, falling number, protein, starch, soluble and total pentosan contents in up to ten environments in Central Europe. The quality of the phenotypic data was high reflected by moderate to high heritability estimates. QTL analyses revealed a total of 31 QTL for Pop-A and 52 for Pop-B. QTL x environment interactions were significant (P 〈 0.01) in most cases but variance of QTL main effect was more prominent. Conclusions: QTL mapping was successfully applied based on two segregating rye populations. QTL underlying grain yield and several quality traits had small effects. In contrast, thousand-kernel weight, test weight, falling number and starch content were affected by several major QTL with a high frequency of occurrence in cross validation. These QTL explaining a large proportion of the genotypic variance can be exploited in marker-assisted selection programs and are candidates for further genetic dissection.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2012-09-25
    Description: Background: Biologists are elucidating complex collections of genetic regulatory data for multiple organisms. Software is needed for such regulatory network data. Results: The Pathway Tools software provides a comprehensive environment for manipulating molecular regulatory interactions that integrates regulatory data with an organism's genome and metabolic network. The Pathway Tools regulation ontology captures transcriptional and translational regulation, substrate-level regulation of enzyme activity, post-translational modifications, and regulatory pathways. Curated collections of regulatory data are available for Escherichia coli, Bacillus subtilis, and Shewanella oneidensis. Regulatory visualizations include a novel diagram that sum- marizes all regulatory influences on a gene; a transcription-unit diagram, and an interactive visualization of a full transcriptional regulatory network that can be painted with gene expression data to probe correlations between gene expression and regulatory mechanisms. We introduce a novel type of enrichment analysis that asks whether a gene-expression dataset is over-represented for known regulators. We present algorithms for ranking the degree of regulatory influence of genes , and for computing the net positive and negative regulatory influences on a gene.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2012-09-26
    Description: Background: Inverted repeat genes encode precursor RNAs characterized by hairpin structures. These RNA hairpins are then metabolized by biosynthetic pathways to produce functional small RNAs. In eukaryotic genomes, short non-autonomous transposable elements can have similar size and hairpin structures as non-coding precursor RNAs. This resemblance leads to problems annotating small RNAs.MethodWe mapped all microRNA precursors from miRBASE to several genomes and studied the repetition and dispersion of the corresponding loci. We then searched for repetitive elements overlapping these loci. Results: We developed an automatic method called ncRNAclassifier to classify pre-ncRNAs according to their relationship with transposable elements (TEs). We show there is a correlation between the number of scattered occurrences of ncRNA precursor candidates is correlated with the presence of TEs. We applied ncRNAclassifier on six chordate genomes and report our findings. Among the 1,426 human and 721 mouse pre-miRNAs of miRBase, we identified 235 and 68 mis-annotated pre-miRNAs respectively corresponding completely to TEs. Conclusions: We provide a tool enabling the identification of repetitive elements in precursor ncRNA sequences. ncRNAclassifier is available at http://EvryRNA.ibisc.univ-evry.fr
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2012-09-29
    Description: Background: Owing to the low cost of the high throughput Next Generation Sequencing (NGS) technology, more and more species have been and will be sequenced. However, de novo assemblies of large eukaryotic genomes thus produced are composed of a large number of contigs and scaffolds of medium to small size, having no chromosomal assignment. Radiation hybrid (RH) mapping is a powerful tool for building whole genome maps and has been used for several animal species, to help assign sequence scaffolds to chromosomes and determining their order. Results: We report here a duck whole genome RH panel obtained by fusing female duck embryonic fibroblasts irradiated at a dose of 6,000 rads, with HPRT-deficient Wg3hCl2 hamster cells. The ninety best hybrids, having an average retention of 23.6% of the duck genome, were selected for the final panel. To allow the genotyping of large numbers of markers, as required for whole genome mapping, without having to cultivate the hybrid clones on a large scale, three different methods involving Whole Genome Amplification (WGA) and/or scaling down PCR volumes by using the Fluidigm BioMarkTM Integrated Fluidic Circuits (IFC) Dynamic ArrayTM for genotyping were tested. RH maps of APL12 and APL22 were built, allowing the detection of intrachromosomal rearrangements when compared to chicken. Finally, the panel proved useful for checking the assembly of sequence scaffolds and for mapping EST located on one of the smallest microchromosomes. Conclusion: The Fluidigm BioMarkTM Integrated Fluidic Circuits (IFC) Dynamic ArrayTM genotyping by quantitative PCR provides a rapid and cost-effective method for building RH linkage groups. Although the vast majority of genotyped markers exhibited a picture coherent with their associated scaffolds, a few of them were discordant, pinpointing potential assembly errors. Comparative mapping with chicken chromosomes GGA21 and GGA11 allowed the detection of the first chromosome rearrangements on microchromosomes between duck and chicken. As in chicken, the smallest duck microchromosomes appear missing in the assembly and more EST data will be needed for mapping them. Altogether, this underlines the added value of RH mapping to improve genome assemblies.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2012-09-29
    Description: Background: Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results: Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions: The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2012-09-29
    Description: Background: Guinea pig (Cavia porcellus) is an important model for human intestinal research. We have characterized the faecal microbiota of 60 guinea pigs using Illumina shotgun metagenomics, and used this data to compile a gene catalogue of its prevalent microbiota. Subsequently, we compared the guinea pig microbiome to existing human gut metagenome data from the MetaHIT project. Results: We found that the bacterial richness obtained for human samples was lower than for guinea pig samples. The intestinal microbiotas of both species were dominated by the two phyla Bacteroidetes and Firmicutes, but at genus level, the majority of identified genera (320 of 376) were differently abundant in the two hosts. For example, the guinea pig contained considerably more of the mucin-degrading Akkermansia, as well as of the methanogenic archaea Methanobrevibacter than found in humans. Most microbiome functional categories were less abundant in guinea pigs than in humans. Exceptions included functional categories possibly reflecting dehydration/rehydration stress in the guinea pig intestine. Finally, we showed that microbiological databases have serious anthropocentric biases, which impacts model organism research. Conclusions: The results lay the foundation for future gastrointestinal research applying guinea pigs as models for humans.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2012-09-29
    Description: Background: The main function of gene promoters appears to be the integration of different gene products in their biological pathways in order to maintain homeostasis. Generally, promoters have been classified in two major classes, namely TATA and CpG. Nevertheless, many genes using the same combinatorial formation of transcription factors have different gene expression patterns. Accordingly, we tried to ask ourselves some fundamental questions: Why certain genes have an overall predisposition for higher gene expression levels than others ?. What causes such a predisposition ?. Is there a structural relationship of these sequences in different tissues ?. Is there a strong phylogenetic relationship between promoters of closely related species ?. Results: In order to gain valuable insights into different promoter regions, we obtained a series of image-based patterns which allowed us to identify 10 generic classes of promoters. A comprehensive analysis was undertaken for promoter sequences from Arabidopsis thaliana, Drosophila melanogaster, Homo sapiens and Oryza sativa, and a more extensive analysis of tissue-specific promoters in humans. We observed a clear preference for these species to use certain classes of promoters for specific biological processes. Moreover, in humans, we found that different tissues use distinct classes of promoters, reflecting an emerging promoter network. Depending on the tissue type, comparisons made between these classes of promoters reveal a complementarity between their patterns whereas some other classes of promoters have been observed to occur in competition. Furthermore, we also noticed the existence of some transitional states between these classes of promoters that may explain certain evolutionary mechanisms, which suggest a possible predisposition for specific levels of gene expression and perhaps for a different number of factors responsible for triggering gene expression. Our conclusions are based on comprehensive data from three different databases and a new computer model whose core is using Kappa index of coincidence. Conclusions: To fully understand the connections between gene promoters and gene expression, we analyzed thousands of promoter sequences using our Kappa Index of Coincidence method and a specialized Optical Character Recognition (OCR) neural network. Under our criteria, 10 classes of promoters were detected. In addition, the existence of "transitional" promoters suggests that there is an evolutionary weighted continuum between classes, depending perhaps upon changes in their gene products.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2012-10-04
    Description: Background: Meiotic maps are a key tool for comparative genomics and association mapping studies. Next-generation sequencing and genotyping by sequencing are speeding the processes of SNP discovery and the development of new genetic tools, including meiotic maps for numerous species. Currently there are limited genetic resources for sockeye salmon, Oncorhynchus nerka. We develop the first dense meiotic map for sockeye salmon using a combination of novel SNPs found in restriction site associated DNA (RAD tags) and SNPs available from existing expressed sequence tag (EST) based assays. Results: We discovered and genotyped putative SNPs in 3,430 RAD tags. We removed paralogous sequence variants leaving 1,672 SNPs; these were combined with 53 EST-based SNP genotypes for linkage mapping. The map contained 29 male and female linkage groups, consistent with the haploid chromosome number expected for sockeye salmon. The female map contains 1,057 loci spanning 4,896 cM, and the male map contains 1,118 loci spanning 4,220 cM. Regions of conservation with rainbow trout and synteny between the RAD based rainbow trout map and the sockeye salmon map were established. Conclusions: Using RAD sequencing and EST-based SNP assays we successfully generated the first high density linkage map for sockeye salmon.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2012-10-04
    Description: Background: Brassica oleracea encompass a family of vegetables and cabbage that are among the most widely cultivated crops. In 2009, the B. oleracea Genome Sequencing Project was launched using next generation sequencing technology. None of the available maps were detailed enough to anchor the sequence scaffolds for the Genome Sequencing Project. This report describes the development of a large number of SSR and SNP markers from the whole genome shotgun sequence data of B. oleracea, and the construction of a high-density genetic linkage map using a double haploid mapping population. Results: The B. oleracea high-density genetic linkage map that was constructed includes 1,227 markers in nine linkage groups spanning a total of 1197.9 cM with an average of 0.98 cM between adjacent loci. There were 602 SSR markers and 625 SNP markers on the map. The chromosome with the highest number of markers (186) was C03, and the chromosome with smallest number of markers (99) was C09. Conclusions: This first high-density map allowed the assembled scaffolds to be anchored to pseudochromosomes. The map also provides useful information for positional cloning, molecular breeding, and integration of information of genes and traits in B. oleracea. All the markers on the map will be transferable and could be used for the construction of other genetic maps.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2012-10-05
    Description: Background: In bacteria, the weak correlations at the genome scale between mRNA and protein levels suggest that not all mRNAs are translated with the same efficiency. To experimentally explore mRNA translational level regulation at the systemic level, the detailed translational status (translatome) of all mRNAs was measured in the model bacterium Lactococcus lactis in exponential phase growth. Results: Results demonstrated that only part of the entire population of each mRNA species was engaged in translation. For transcripts involved in translation, the polysome size reached a maximum of 18 ribosomes. The fraction of mRNA engaged in translation (ribosome occupancy) and ribosome density were not constant for all genes. This high degree of variability was analyzed by bioinformatics and statistical modeling in order to identify general rules of translational regulation. For most of the genes, the ribosome density was lower than the maximum value revealing major control of translation by initiation. Gene function was a major translational regulatory determinant. Both ribosome occupancy and ribosome density were particularly high for transcriptional regulators, demonstrating the positive role of translational regulation in the coordination of transcriptional networks. mRNA stability was a negative regulatory factor of ribosome occupancy and ribosome density, suggesting antagonistic regulation of translation and mRNA stability. Furthermore, ribosome occupancy was identified as a key component of intracellular protein levels underlining the importance of translational regulation. Conclusions: We have determined, for the first time in a bacterium, the detailed translational status for all mRNAs present in the cell. We have demonstrated experimentally the high diversity of translational states allowing individual gene differentiation and the importance of translation-level regulation in the complex process linking gene expression to protein synthesis.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2012-10-06
    Description: Background: Bluefin tunas are highly prized pelagic fish species representing a significant economic resource to fisheries throughout the world. Atlantic bluefin tuna (Thunnus thynnus) populations have significantly declined due to overexploitation. As a consequence of their value and population decline, T. thynnus has been the focus of considerable research effort concerning many aspects of their life history. However, in-depth understanding of T. thynnus reproductive biology is still lacking. Knowledge of reproductive physiology is a very important tool for determining effective fisheries and aquaculture management. Transcriptome techniques are proving powerful and provide novel insights into physiological processes. Construction of a microarray from T. thynnus ESTs sourced from reproductive tissues has provided an ideal platform to study the reproductive physiology of bluefin tunas. The aim of this investigation was to compare transcription profiles from the ovaries and testes of mature T. thynnus to establish sex specific variations underlying their reproductive physiology. Results: Male and females T. thynnus gonad tissues were collected from the wild and histologically staged. Sub-samples of sexually mature tissues were also measured for their mRNA differential expression among the sexes using the custom microarray design BFT 4X44K. A total of 7068 ESTs were assessed for differential expression of which 1273 ESTs were significantly different (p 2 fold change in expression according to sex. Differential expression for 13 of these ESTs was validated with quantitative PCR. These include genes involved in egg envelope formation, hydration, and lipid transport/accumulation more highly expressed in ovaries compared with testis, while genes involved in meiosis, sperm motility and lipid metabolism were more highly expressed in testis compared with ovaries. Conclusions: This investigation has furthered our knowledge of bluefin tunas reproductive biology by using a contemporary transcriptome approach. Gene expression profiles in T. thynnus sexually mature testes and ovaries were characterized with reference to gametogenesis and potential alternative functions. This report is the first application of microarray technology for bluefin tunas and demonstrates the efficacy by which this technique may be used for further characterization of specific biological aspects for this valuable teleost fish.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2012-10-14
    Description: Background: New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. Results: We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as "Sifting Families," or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology--based analyses. Conclusions: We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/).
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2012-10-14
    Description: Background: Small non-coding RNAs (sRNAs) have attracted attention as a new class of gene regulators in both eukaryotes and bacteria. Genome-wide screening methods have been successfully applied in Gram-negative bacteria to identify sRNA regulators. Many sRNAs are well characterized, including their target mRNAs and mode of action. In comparison, little is known about sRNAs in Gram-positive pathogens. In this study, we identified novel sRNAs in the exclusively human pathogen Streptococcus pyogenes M49 (Group A Streptococcus, GAS M49), employing a whole genome intergenic tiling array approach. GAS is an important pathogen that causes diseases ranging from mild superficial infections of the skin and mucous membranes of the naso-pharynx, to severe toxic and invasive diseases. Results: We identified 55 putative sRNAs in GAS M49 that were expressed during growth. Of these, 42 were novel. Some of the newly-identified sRNAs belonged to one of the common non-coding RNA families described in the Rfam database. Comparison of the results of our screen with the outcome of two recently published bioinformatics tools showed a low level of overlap between putative sRNA genes. Previously, 40 potential sRNAs have been reported to be expressed in a GAS M1T1 serotype, as detected by a whole genome intergenic tiling array approach. Our screen detected 12 putative sRNA genes that were expressed in both strains. Twenty sRNA candidates appeared to be regulated in a medium-dependent fashion, while eight sRNA genes were regulated throughout growth in chemically defined medium. Expression of candidate genes was verified by reverse transcriptase-qPCR. For a subset of sRNAs, the transcriptional start was determined by 5[prime] rapid amplification of cDNA ends-PCR (RACE-PCR) analysis. Conclusions: In accord with the results of previous studies, we found little overlap between different screening methods, which underlines the fact that a comprehensive analysis of sRNAs expressed by a given organism requires the complementary use of different methods and the investigation of several environmental conditions. Despite a high conservation of sRNA genes within streptococci, the expression of sRNAs appears to be strain specific.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2012-10-07
    Description: Background: Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach. Results: Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10--15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA's Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows. Conclusions: Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2012-10-07
    Description: Background: Bacteria of the genus Arthrobacter are ubiquitous in soil environments and can be considered as true survivalists. Arthrobacter sp. strain Rue61a is an isolate from sewage sludge able to utilize quinaldine (2-methylquinoline) as sole carbon and energy source. The genome provides insight into the molecular basis of the versatility and robustness of this environmental Arthrobacter strain. Results: The genome of Arthrobacter sp. Rue61a consists of a single circular chromosome of 4,736,495 bp with an average G + C content of 62.32%, the circular 231,551-bp plasmid pARUE232, and the linear 112,992-bp plasmid pARUE113 that was already published. Plasmid pARUE232 is proposed to contribute to the resistance of Arthrobacter sp. Rue61a to arsenate and Pb2+, whereas the linear plasmid confers the ability to convert quinaldine to anthranilate. Remarkably, degradation of anthranilate exclusively proceeds via a CoA-thioester pathway. Apart from quinaldine utilization, strain Rue61a has a limited set of aromatic degradation pathways, enabling the utilization of 4-hydroxy-substituted aromatic carboxylic acids, which are characteristic products of lignin depolymerization, via ortho cleavage of protocatechuate. However, 4-hydroxyphenylacetate degradation likely proceeds via meta cleavage of homoprotocatechuate. The genome of strain Rue61a contains numerous genes associated with osmoprotection, and a high number of genes coding for transporters. It encodes a broad spectrum of enzymes for the uptake and utilization of various sugars and organic nitrogen compounds. A. aurescens TC-1 is the closest sequenced relative of strain Rue61a. Conclusions: The genome of Arthrobacter sp. Rue61a reflects the saprophytic lifestyle and nutritional versatility of the organism and a strong adaptive potential to environmental stress. The circular plasmid pARUE232 and the linear plasmid pARUE113 contribute to heavy metal resistance and to the ability to degrade quinaldine, respectively.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2012-09-22
    Description: Background: Coral reefs belong to the most ecologically and economically important ecosystems on our planet. Yet, they are under steady decline worldwide due to rising sea surface temperatures, disease, and pollution. Understanding the molecular impact of these stressors on different coral species is imperative in order to predict how coral populations will respond to this continued disturbance. The use of molecular tools such as microarrays has provided deep insight into the molecular stress response of corals. Here, we have performed comparative genomic hybridizations (CGH) with different coral species to an Acropora palmata microarray platform containing 13,546 cDNA clones in order to identify potentially rapidly evolving genes and to determine the suitability of existing microarray platforms for use in gene expression studies (via heterologous hybridization). Results: Our results showed that the current microarray platform for A. palmata is able to provide biological relevant information for a wide variety of coral species covering both the complex clade as well the robust clade. Analysis of the fraction of highly diverged genes showed a significantly higher amount of genes without annotation corroborating previous findings that point towards a higher rate of divergence for taxonomically restricted genes. Among the genes with annotation, we found many mitochondrial genes to be highly diverged in M. faveolata when compared to A. palmata, while the majority of nuclear encoded genes maintained an average divergence rate. Conclusions: The use of present microarray platforms for transcriptional analyses in different coral species will greatly enhance the understanding of the molecular basis of stress and health and highlight evolutionary differences between scleractinian coral species. On a genomic basis, we show that cDNA arrays can be used to identify patterns of divergence. Mitochondrion-encoded genes seem to have diverged faster than nuclear encoded genes in robust corals. Accordingly, this needs to be taken into account when using mitochondrial markers for scleractinian phylogenies.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2012-09-22
    Description: Background: Chromosome conformation capture experiments result in pairwise proximity measurements between chromosome locations in a genome, and they have been used to construct three-dimensional models of genomic regions, chromosomes, and entire genomes. These models can be used to understand long-range gene regulation, chromosome rearrangements, and the relationships between sequence and spatial location. However, it is unclear whether these pairwise distance constraints provide sufficient information to embed chromatin in three dimensions. A priori, it is possible that an infinite number of embeddings are consistent with the measurements due to a lack of constraints between some regions. It is therefore necessary to separate regions of the chromatin structure that are sufficiently constrained from regions with measurements that do not provide enough information to reconstruct the embedding. Results: We present a new method based on graph rigidity to assess the suitability of experiments for constructingplausible three-dimensional models of chromatin structure. Underlying this analysis is a new, efficient, andaccurate algorithm for finding sufficiently constrained (rigid) collections of constraints in three dimensions, aproblem for which there is no known efficient algorithm. Applying the method to four recent chromosomeconformation experiments, we find that, for even stringently filtered constraints, a large rigid component spansmost of the measured region. Filtering highlights higher-confidence regions, and we find that the organizationof these regions depends crucially on short-range interactions. Conclusions: Without performing an embedding or creating a frequency-to-distance mapping, our proposed approachestablishes which substructures are supported by a sufficient framework of interactions. It also establishes thatinteractions from recent highly filtered genome-wide chromosome conformation experiments provide anadequate set of constraints for embedding. Pre-processing experimentally observed interactions with thismethod before relating chromatin structure to biological phenomena will ensure that hypothesized correlationsare not driven by the arbitrary choice of a particular unconstrained embedding. The software for identifyingrigid components is GPL-Licensed and available for download at http://cbcb.umd.edu/kingsford-group/starfish.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2012-09-22
    Description: Background: Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results: GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions: The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2012-09-22
    Description: Background: The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results: The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index---if available---when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions: Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2012-09-22
    Description: Background: Pepper (Capsicum annuum L.) is one of the most important vegetable crops worldwide.However, its yield and fruit quality can be severely threatened by several pathogens. Theplant nucleotide-binding site (NBS)-leucine-rich repeat (LRR) gene family is the largest classof known disease resistance genes (R genes) effective against such pathogens. Therefore, theisolation and identification of such R gene homologues from pepper will provide a criticalfoundation for improving disease resistance breeding programs. Results: A total of 78 R gene analogues (CaRGAs) were identified in pepper by degenerate PCRamplification and database mining. Phylogenetic tree analysis of the deduced amino acidsequences for 51 of these CaRGAs with typically conserved motifs ( P-loop, kinase-2 andGLPL) along with some known R genes from Arabidopsis and tomato grouped theseCaRGAs into the non-Toll interleukin-1 receptor (TIR)-NBS-LRR (CaRGAs I to IV) andTIR-NBS-LRR (CaRGAs V to VII) subfamilies. The presence of consensus motifs (i.e. Ploop,kinase-2 and hydrophobic domain) is typical of the non-TIR- and TIR-NBS-LRR genesubfamilies. This finding further supports the view that both subfamilies are widelydistributed in dicot species. Functional divergence analysis provided strong statisticalevidence of altered selective constraints during protein evolution between the twosubfamilies. Thirteen critical amino acid sites involved in this divergence were also identifiedusing DIVERGE version 2 software. Analyses of non-synonymous and synonymoussubstitutions per site showed that purifying selection can play a critical role in theevolutionary processes of non-TIR- and TIR-NBS-LRR RGAs in pepper. In addition, fourspecificity-determining positions were predicted to be responsible for functional specificity.qRT-PCR analysis showed that both salicylic and abscisic acids induce the expression ofCaRGA genes, suggesting that they may primarily be involved in defence responses byactivating signaling pathways. Conclusion: The identified CaRGAs are a valuable resource for discovering R genes and developing RGAmolecular markers for genetic map construction. They will also be useful for improvingdisease resistance in pepper. The findings of this study provide a better understanding of theevolutionary mechanisms that drive the functional diversification of non-TIR- and TIR-NBSLRRR genes in pepper.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2012-10-11
    Description: Background: The classical Bordetella subspecies are phylogenetically closely related, yet differ in some of the most interesting and important characteristics of pathogens, such as host range, virulence and persistence. The compelling picture from previous comparisons of the three sequenced genomes was of genome degradation, with substantial loss of genome content (up to 24%) associated with adaptation to humans. Results: For a more comprehensive picture of lineage evolution, we employed comparative genomic and phylogenomic analyses using seven additional diverse, newly sequenced Bordetella isolates. Genome-wide single nucleotide polymorphism (SNP) analysis supports a reevaluation of the phylogenetic relationships between the classical Bordetella subspecies, and suggests a closer link between ovine and human B. parapertussis lineages than has been previously proposed. Comparative analyses of genome content revealed that only 50% of the pan-genome is conserved in all strains, reflecting substantial diversity of genome content in these closely related pathogens that may relate to their different host ranges, virulence and persistence characteristics. Strikingly, these analyses suggest possible horizontal gene transfer (HGT) events in multiple loci encoding virulence factors, including O-antigen and pertussis toxin (Ptx). Segments of the pertussis toxin locus (ptx) and its secretion system locus (ptl) appear to have been acquired by the classical Bordetella subspecies and are divergent in different lineages, suggesting functional divergence in the classical Bordetellae. Conclusions: Together, these observations, especially in key virulence factors, reveal that multiple mechanisms, such as point mutations, gain or loss of genes, as well as HGTs, contribute to the substantial phenotypic diversity of these versatile subspecies in various hosts.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2012-10-11
    Description: Background: The accuracy of genomic prediction is highly dependent on the size of the reference population. For small populations, including information from other populations could improve this accuracy. The usual strategy is to pool data from different populations; however, this has not proven as successful as hoped for with distantly related breeds. BayesRS is a novel approach to share information across populations for genomic predictions. The approach allows information to be captured even where the phase of SNP alleles and casual mutation alleles are reversed across populations, or the actual casual mutation is different between the populations but affects the same gene. Proportions of a four-distribution mixture for SNP effects in segments of fixed size along the genome are derived from one population and set as location specific prior proportions of distributions of SNP effects for the target population. The model was tested using dairy cattle populations of different breeds: 540 Australian Jersey bulls, 2297 Australian Holstein bulls and 5214 Nordic Holstein bulls. The traits studied were protein-, fat- and milk yield. Genotypic data was Illumina 777K SNPs, real or imputed. Results: Results showed an increase in accuracy of up to 3.5% for the Jersey population when using BayesRS with a prior derived from Australian Holstein compared to a model without location specific priors. The increase in accuracy was however lower than was achieved when reference populations were combined to estimate SNP effects, except in the case of fat yield. The small size of the Jersey validation set meant that these improvements in accuracy were not significant using a Hotelling-Williams t-test at the 5% level. An increase in accuracy of 1-2% for all traits was observed in the Australian Holstein population when using a prior derived from the Nordic Holstein population compared to using no prior information. These improvements were significant (P
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2012-10-11
    Description: Background: The MYB gene family comprises one of the richest groups of transcription factors in plants. Plant MYB proteins are characterized by a highly conserved MYB DNA-binding domain. MYB proteins are classified into four major groups namely, 1R-MYB, 2R-MYB, 3R-MYB and 4R-MYB based on the number and position of MYB repeats. MYB transcription factors are involved in plant development, secondary metabolism, hormone signal transduction, disease resistance and abiotic stress tolerance. A comparative analysis of MYB family genes in rice and Arabidopsis will help reveal the evolution and function of MYB genes in plants. Results: A genome-wide analysis identified at least 155 and 197 MYB genes in rice and Arabidopsis, respectively. Gene structure analysis revealed that MYB family genes possess relatively more number of introns in the middle as compared with C- and N-terminal regions of the predicted genes. Intronless MYB-genes are highly conserved both in rice and Arabidopsis. MYB genes encoding R2R3 repeat MYB proteins retained conserved gene structure with three exons and two introns, whereas genes encoding R1R2R3 repeat containing proteins consist of six exons and five introns. The splicing pattern is similar among R1R2R3 MYB genes in Arabidopsis. In contrast, variation in splicing pattern was observed among R1R2R3 MYB members of rice. Consensus motif analysis of 1kb upstream region (5[prime] to translation initiation codon) of MYB gene ORFs led to the identification of conserved and over-represented cis-motifs in both rice and Arabidopsis. Real-time quantitative RT-PCR analysis showed that several members of MYBs are up-regulated by various abiotic stresses both in rice and Arabidopsis. Conclusion: A comprehensive genome-wide analysis of chromosomal distribution, tandem repeats and phylogenetic relationship of MYB family genes in rice and Arabidopsis suggested their evolution via duplication. Genome-wide comparative analysis of MYB genes and their expression analysis identified several MYBs with potential role in development and stress response of plants.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2012-10-11
    Description: Background: Coordinated cell growth and development requires that cells regulate the expression of large sets of genes in an appropriate manner, and one of the most complex and metabolically demanding pathways that cells must manage is that of ribosome biogenesis. Ribosome biosynthesis depends upon the activity of hundreds of gene products, and it is subject to extensive regulation in response to changing cellular conditions. We previously described an unusual property of the genes that are involved in ribosome biogenesis in yeast; a significant fraction of the genes exist on the chromosomes as immediately adjacent gene pairs. The incidence of gene pairing can be as high as 24% in some species, and the gene pairs are found in all of the possible tandem, divergent, and convergent orientations. Results: We investigated co-regulated gene sets in S. cerevisiae beyond those related to ribosome biogenesis, and found that a number of these regulons, including those involved in DNA metabolism, heat shock, and the response to cellular stressors were also significantly enriched for adjacent gene pairs. We found that as a whole, adjacent gene pairs were more tightly co-regulated than unpaired genes, and that the specific gene pairing relationships that were most widely conserved across divergent fungal lineages were correlated with those genes that exhibited the highest levels of transcription. Finally, we investigated the gene positions of ribosome related genes across a widely divergent set of eukaryotes, and found a significant level of adjacent gene pairing well beyond yeast species. Conclusion: While it has long been understood that there are connections between genomic organization and transcriptional regulation, this study reveals that the strategy of organizing genes from related, co-regulated pathways into pairs of immediately adjacent genes is widespread, evolutionarily conserved, and functionally significant.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2012-10-11
    Description: Background: Myostatin, a member of the TGFbeta superfamily, is well known as a potent and specific negative regulator of muscle growth. Targeting the myostatin signalling pathway may offer promising therapeutic strategies for the treatment of muscle-wasting disorders. In the last decade, various myostatin-binding proteins have been identified to be able to inhibit myostatin activity. One of these is GASP1 (Growth and Differentiation Factor-Associated Serum Protein-1), a protein containing a follistatin domain as well as multiple domains associated with protease inhibitors. Despite in vitro data, remarkably little is known about in vivo functions of Gasp1. To further address the role of GASP1 during mouse development and in adulthood, we generated a gain-of-function transgenic mouse model that overexpresses Gasp1 under transcriptional control of the human cytomegalovirus immediate-early promoter/enhancer. Results: Overexpression of Gasp1 led to an increase in muscle mass observed not before day 15 of postnatal life. The surGasp1 transgenic mice did not display any other gross abnormality. Histological and morphometric analysis of surGasp1 rectus femoris muscles revealed an increase in myofiber size without a corresponding increase in myofiber number. Fiber-type distribution was unaltered. Interestingly, we do not detect a change in total fat mass and lean mass. These results differ from those for myostatin knockout mice, transgenic mice overexpressing the myostatin propeptide or follistatin which exhibit both muscle hypertrophy and hyperplasia, and show minimal fat deposition. Conclusions: Altogether, our data give new insight into the in vivo functions of Gasp1. As an extracellular regulatory factor in the myostatin signalling pathway, additional studies on GASP1 and its homolog GASP2 are required to elucidate the crosstalk between the different intrinsic inhibitors of the myostatin.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2012-09-24
    Description: Background: Theileria parva is a tick-borne protozoan parasite, which causes East Coast Fever, a disease of cattle in sub-Saharan Africa. Like Plasmodium falciparum, the parasite undergoes a transient diploid life-cycle stage in the gut of the arthropod vector, which involves an obligate sexual cycle. As assessed using low-resolution VNTR markers, the crossover (CO) rate in T. parva is relatively high and has been reported to vary across different regions of the genome; non-crossovers (NCOs) and CO-associated gene conversions have not yet been characterised due to the lack of informative markers. To examine all recombination events at high marker resolution, we sequenced the haploid genomes of two parental strains, and two recombinant clones derived from ticks fed on cattle that had been simultaneously co-infected with two different parasite isolates. Results: By comparing the genome sequences, we were able to genotype over 64 thousand SNP markers with an average spacing of 127 bp in the two progeny clones. Previously unrecognized COs in sub-telomeric regions were detected. About 50% of CO breakpoints were accompanied by gene conversion events. Such a high fraction of COs accompanied by gene conversions demonstrated the contributions of meiotic recombination to the diversity and evolutionary success of T. parva, as the process not only redistributed existing genetic variations, but also altered allelic frequencies. Compared to COs, NCOs were more frequently observed and more uniformly distributed across the genome. In both progeny clones, genomic regions with more SNP markers had a reduced frequency of COs or NCOs, suggesting that the sequence divergence between the parental strains was high enough to adversely affect recombination frequencies. Intra-species polymorphism analysis identified 81 loci as likely to be under selection in the sequenced genomes. Conclusions: Using whole genome sequencing of two recombinant clones and their parents, we generated maps of COs, NCOs, and CO-associated gene conversion events for T. parva. The data comprises one of the highest-resolution genome-wide analyses of the multiple outcomes of meiotic recombination for this pathogen. The study also demonstrates the usefulness of high throughput sequencing typing for detailed analysis of recombination in organisms in which conventional genetic analysis is technically difficult.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2012-09-25
    Description: Background: Sporadic Amyotrophic Lateral Sclerosis (sALS) is a devastating, complex disease of unknown etiology. We studied this disease with microarray technology to capture as much biological complexity as possible. The Affymetrix-focused BaFL pipeline takes into account problems with probes that arise from physical and biological properties, so we adapted it to handle the long-oligonucleotide probes on our arrays (hence LO-BaFL). The revised method was tested against a validated array experiment and then used in a meta-analysis of peripheral white blood cells from healthy control samples in two experiments. We predicted differentially expressed (DE) genes in our sALS data, combining the results obtained using the TM4 suite of tools with those from the LO-BaFL method. Those predictions were tested using qRT-PCR assays. Results: LO-BaFL filtering and DE testing accurately predicted previously validated DE genes in a published experiment on coronary artery disease (CAD). Filtering healthy control data from the sALS and CAD studies with LO-BaFL resulted in highly correlated expression levels across many genes. After bioinformatics analysis, twelve genes from the sALS DE gene list were selected for independent testing using qRT-PCR assays. High-quality RNA from six healthy Control and six sALS samples yielded the predicted differential expression for 7 genes: TARDBP, SKIV2L2, C12orf35, DYNLT1, ACTG1, B2M, and ILKAP. Four of the seven have been previously described in sALS studies, while ACTG1, B2M and ILKAP appear in the context of this disease for the first time. Supplementary material can be accessed at: http://webpages.uncc.edu/~cbaciu/LO-BaFL/supplementary_data.html Conclusion: LO-BaFL predicts DE results that are broadly similar to those of other methods. The small healthy control cohort in the sALS study is a reasonable foundation for predicting DE genes. Modifying the BaFL pipeline allowed us to remove noise and systematic errors, improving the power of this study, which had a small sample size. Each bioinformatics approach revealed DE genes not predicted by the other; subsequent PCR assays confirmed seven of twelve candidates, a relatively high success rate.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2012-09-25
    Description: Background: The function of RNA from the non-coding (the so called "dark matter") regions of the genome has been a subject of considerable recent debate. Perhaps the most controversy is regarding the function of RNAs found in introns of annotated transcripts, where most of the reads that map outside of exons are usually found. However, it has been reported that the levels of RNA in introns are minor relative to those of the corresponding exons, and that changes in the levels of intronic RNAs correlate tightly with that of adjacent exons. This would suggest that RNAs produced from the vast expanse of intronic space are just pieces of pre-mRNAs or excised introns en route to degradation. Results: We present data that challenges the notion that intronic RNAs are mere by-standers in the cell. By performing a highly quantitative RNAseq analysis of transcriptome changes during an inflammation time course, we show that intronic RNAs have a number of features that would be expected from functional, standalone RNA species. We show that there are thousands of introns in the mouse genome that generate RNAs whose overall abundance, which changes throughout the inflammation timecourse, and other properties suggest that they function in yet unknown ways. Conclusions: So far, the focus of non-coding RNA discovery has shied away from intronic regions as those were believed to simply encode parts of pre-mRNAs. Results presented here suggest a very different situation -- the sequences encoded in the introns appear to harbor a yet unexplored reservoir of novel, functional RNAs. As such, they should not be ignored in surveys of functional transcripts or other genomic studies.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2012-09-25
    Description: Background: Metastasis is characterized by spreading of neoplastic cells to an organ other than where they originated and is the predominant cause of death among cancer patients. This holds true for melanoma, whose incidence is increasing more rapidly than any other cancer and once disseminated has few therapeutic options. Here we performed whole exome sequencing of two sets of matched normal and metastatic tumor DNAs. Results: Using stringent criteria, we evaluated the similarities and differences between the lesions. We find that in both cases, 96% of the single nucleotide variants are shared between the two metastases indicating that clonal populations gave rise to the distant metastases. Analysis of copy number variation patterns of both metastatic sets revealed a trend similar to that seen with our single nucleotide variants. Analysis of pathway enrichment on tumor sets shows commonly mutated pathways enriched between individual sets of metastases and all metastases combined. Conclusions: These data provide a proof-of-concept suggesting that individual metastases may have sufficient similarity for successful targeting of driver mutations.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2012-09-26
    Description: Background: Community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is a significant bacterial pathogen that poses considerable clinical and public health challenges. The majority of the CA-MRSA disease burden consists of skin and soft tissue infections (SSTI) not associated with significant morbidity; however, CA-MRSA also causes severe, invasive infections resulting in significant morbidity and mortality. The broad range of disease severity may be influenced by bacterial genetic variation. Results: We sequenced the complete genomes of 36 CA-MRSA clinical isolates from the predominant North American community acquired clonal type USA300 (18 SSTI and 18 severe infection-associated isolates). While all 36 isolates shared remarkable genetic similarity, we found greater overall time-dependent sequence diversity among SSTI isolates. In addition, pathway analysis of non-synonymous variations revealed increased sequence diversity in the putative virulence genes of SSTI isolates. Conclusions: Here we report the first whole genome survey of diverse clinical isolates of the USA300 lineage and describe the evolution of the pathogen over time within a defined geographic area. The results demonstrate the close relatedness of clinically independent CA-MRSA isolates, which carry implications for understanding CA-MRSA epidemiology and combating its spread.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2012-09-26
    Description: Background: Genomic and transcriptomic approaches have the potential for unveiling the genome-wide response to environmental perturbations. The abundance of the catadromous European eel (Anguilla anguilla) stock has been declining since the 1980s probably due to a combination of anthropogenic and climatic factors. In this paper, we explore the transcriptomic dynamics between individuals from high (river Tiber, Italy) and low pollution (lake Bolsena, Italy) environments, which were measured for 36 PCBs, several organochlorine pesticides and brominated flame retardants and nine metals. Results: To this end, we first (i) updated the European eel transcriptome using deep sequencing data with a total of 640,040 reads assembled into 44,896 contigs (Eeelbase release 2.0), and (ii) developed a transcriptomic platform for global gene expression profiling in the critically endangered European eel of about 15,000 annotated contigs, which was applied to detect differentially expressed genes between polluted sites. Several detoxification genes related to metabolism of pollutants were upregulated in the highly polluted site, including genes that take part in phase I of the xenobiotic metabolism (CYP3A), phase II (glutathione-S-transferase) and oxidative stress (glutathione peroxidase). In addition, key genes in the mitochondrial respiratory chain and oxidative phosphorylation were down-regulated at the Tiber site relative to the Bolsena site. Conclusions: Together with the induced high expression of detoxification genes, the suggested lowered expression of genes supposedly involved in metabolism suggests that pollution may also be associated with decreased respiratory and energy production.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2012-09-26
    Description: Background: The events leading to sepsis starts with an invasive infection of a primary organ of the body followed by an overwhelming systemic response. Intra-abdominal infections are the second most common cause of sepsis. Peritoneal fluid is the primary site of infection in these cases. A microarray-based approach was used to study the temporal changes in cells from the peritoneal cavity of septic mice and to identify potential biomarkers and therapeutic targets for this subset of sepsis patients. Results: We conducted microarray analysis of the peritoneal cells of mice infected with a non-pathogenic strain of Escherichia coli. Differentially expressed genes were identified at two early (1 h, 2 h) and one late time point (18 h). A multiplexed bead array analysis was used to confirm protein expression for several cytokines which showed differential expression at different time points based on the microarray data. Gene Ontology based hypothesis testing identified a positive bias of differentially expressed genes associated with cellular development and cell death at 2 h and 18 h respectively. Most differentially expressed genes common to all 3 time points had an immune response related function, consistent with the observation that a few bacteria are still present at 18 h. Conclusions: Transcriptional regulators like PLAGL2, EBF1, TCF7, KLF10 and SBNO2, previously not described in sepsis, are differentially expressed at early and late time points. Expression pattern for key biomarkers in this study is similar to that reported in human sepsis, indicating the suitability of this model for future studies of sepsis, and the observed differences in gene expression suggest species differences or differences in the response of blood leukocytes and peritoneal leukocytes.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2012-09-27
    Description: Background: With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products. Results: We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications. Conclusions: ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2012-09-27
    Description: Background: While the genetics of diploid inheritance are well studied and software for linkage mapping, haplotyping and QTL analysis are available, for tetraploids the available tools are limited. In order to develop such tools it would be helpful if simulated populations based on a variety of models of the tetraploid meiosis would be available. Results: Here we present PedigreeSim, a software package that simulates meiosis in both diploid and tetraploid species and uses this to simulate pedigrees and cross populations. For tetraploids a variety of models can be used, including both bivalent and quadrivalent formation, varying degrees of preferential pairing of hom(oe)ologous chromosomes, different quadrivalent configurations and more. Simulation of quadrivalent meiosis results as expected in double reduction and recombination between more than two hom(oe)ologous chromosomes. The results are shown to match theoretical predictions. Conclusions: This is the first simulation software that implements all features of meiosis in tetraploids. It allows to generate data for tetraploid and diploid populations, and to investigate different models of tetraploid meiosis. The software and manual are available from http://www.plantbreeding.wur.nl/UK/software_pedigreeSim.html and as Additional files 1, 2, 3 and 4 with this publication.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2012-10-03
    Description: Background: The filamentous fungus Aspergillus fumigatus has become the most important airborne fungal pathogen causing life-threatening infections in immuno-compromised patients. Recently developed high-throughput transcriptome and proteome technologies, such as microarrays, RNA deep-sequencing, and LC-MS/MS of peptide mixtures, are of enormous value for systematically investigating pathogenic organisms. In the field of infection biology, one of the priorities is to collect and standardise data, in order to generate datasets that can be used to investigate and compare pathways and gene responses involved in pathogenicity. The "omics" era provides a multitude of inputs that need to be integrated and assessed. We therefore evaluated the potential of paired-end mRNA-Seq for investigating the regulatory role of the central mitogen activated protein kinase (MpkA). This kinase is involved in the cell wall integrity signalling pathway of A. fumigatus and essential for maintaining an intact cell wall in response to stress. Results: The comparison of the transcriptome and proteome of an A. fumigatus wild-type strain with an mpkA null mutant strain revealed that 70.4% of the genome was found to be expressed and that MpkA plays a significant role in the regulation of many genes involved in cell wall remodelling, oxidative stress and iron starvation response, and secondary metabolite biosynthesis. Moreover, absence of the mpkA gene also strongly affects the expression of genes involved in primary metabolism. The data were further processed to evaluate the potential of the mRNA-Seq technique. We comprehensively matched up our data to published transcriptome studies and were able to show an improved data comparability of mRNA-Seq experiments independently of the technique used. Analysis of transcriptome and proteome data revealed only a weak correlation between mRNA and protein abundance. Conclusions: High-throughput analysis of MpkA-dependent gene expression confirmed many previous findings that this kinase is important for regulating many genes involved in metabolic pathways. Our analysis showed more than 2000 differentially regulated genes. RNA deep-sequencing is less error-prone than established microarray-based technologies. It also provides additional information in A. fumigatus studies and as a result is more suitable for the creation of extensive datasets.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2012-10-04
    Description: Background: Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. Results: The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDB's) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. Conclusions: VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2012-10-04
    Description: Background: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection ofphylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. Results: We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Ourmethod works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of inputtrees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to buildlarger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find afrequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate thatthis heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyondcurrent methods. Conclusions: Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST inall of our synthetic datasets where we could verify the correctness of the result. It also performed well on largeempirical data sets. Its performance is robust to the number and size of the input trees. Overall, this methodprovides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2012-10-04
    Description: Background: Currently, there is no open-source, cross-platform and scalable framework for coalescent analysis in population genetics. There is no scalable GUI based user application either. Such a framework and application would not only drive the creation of more complex and realistic models but also make them truly accessible. Results: As a first attempt, we built a framework and user application for the domain of exact calculations in coalescent analysis. The framework provides an API with the concepts of model, data, statistic, phylogeny, gene tree and recursion. Infinite-alleles and infinite-sites models are considered. It defines pluggable computations such as counting and listing all the ancestral configurations and genealogies and computing the exact probability of data. It can visualize a gene tree, trace and visualize the internals of the recursion algorithm for further improvement and attach dynamically a number of output processors. The user application defines jobs in a plug-in like manner so that they can be activated, deactivated, installed or uninstalled on demand. Multiple jobs can be run and their inputs edited. Job inputs are persisted across restarts and running jobs can be cancelled where applicable. Conclusions: Coalescent theory plays an increasingly important role in analysing molecular population genetic data. Models involved are mathematically difficult and computationally challenging. An open-source, scalable framework that lets users immediately take advantage of the progress made by others will enable exploration of yet more difficult and realistic models. As models become more complex and mathematically less tractable, the need for an integrated computational approach is obvious. Object oriented designs, though has upfront costs, are practical now and can provide such an integrated approach.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2012-10-04
    Description: Background: Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results: Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores 〉1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute Conclusions: SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2012-10-04
    Description: Background: Switchgrass (Panicum virgatum) is a herbaceous crop for the cellulosic biofuel feedstock development in the USA and Europe. As switchgrass is a naturally outcrossing species, accurate identification of selfed progeny is important to producing inbreds, which can be used in the production of heterotic hybrids. Development of a technically reliable, time-saving and easily used marker system is needed to quantify and characterize breeding origin of progeny plants of targeted parents. Results: Genome-wide screening of 915 mapped microsatellite (simple sequence repeat, SSR) markers was conducted, and 842 (92.0%) produced clear and scorable bands on a pooled DNA sample of eight switchgrass varieties. A total of 166 primer pairs were selected on the basis of their relatively even distribution in switchgrass genome and PCR amplification quality on 16 tetraploid genotypes. Mean polymorphic information content value for the 166 markers was 0.810 ranging from 0.116 to 0.959. From them, a core set of 48 loci, which had been mapped on 17 linkage groups, was further tested and optimized to develop 24 sets of duplex markers. Most of (up to 87.5%) targeted, but non-allelic amplicons within each duplex were separated by more than 10-bp. Using the established duplex PCR protocol, selfing ratio (i.e., selfed/all progeny x100%) was identified as 0% for a randomly selected open-pollinated 'Kanlow' genotype grown in the field, 15.4% for 22 field-grown plants of bagged inflorescences, and 77.3% for a selected plant grown in a growth chamber. Conclusions: The study developed a duplex SSR-based PCR protocol consisting of 48 markers, providing ample choices of non-tightly-linked loci in switchgrass whole genome, and representing a powerful, time-saving and easily used method for the identification of selfed progeny in switchgrass. The protocol should be a valuable tool in switchgrass breeding efforts.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2012-10-05
    Description: Background: Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. Results: More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between allelic and phenotypes. Conclusions: Our data present the most comprehensive sequences for the indigenous whitefly species Asia II 3. The extensive comparisons of Asia II 3, MEAM1 and MED transcriptomes will serve as an invaluable resource for revealing the genetic basis of whitefly invasion and the molecular mechanisms underlying their biological differences.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2012-10-05
    Description: Background: Cytochrome P450 proteins (CYPs) play diverse and pivotal roles in fungal metabolism and adaptation to specific ecological niches. Fungal genomes encode extremely variable "CYPomes" ranging from one to more than 300 CYPs. Despite the rapid growth of sequenced fungal and oomycete genomes and the resulting influx of predicted CYPs, the vast majority of CYPs remain functionally uncharacterized. To facilitate the curation and functional and evolutionary studies of CYPs, we previously developed Fungal Cytochrome P450 Database (FCPD), which included CYPs from 70 fungal and oomycete species. Here we present a new version of FCPD (1.2) with more data and an improved classification scheme. Results: The new database contains 22,940 CYPs from 213 species divided into 2,579 clusters and 115 clans. By optimizing the clustering pipeline, we were able to uncover 36 novel clans and to assign 153 orphan CYP families to specific clans. To augment their functional annotation, CYP clusters were mapped to David Nelson's P450 databases, which archive a total of 12,500 manually curated CYPs. Additionally, over 150 clusters were functionally classified based on sequence similarity to experimentally characterized CYPs. Comparative analysis of fungal and oomycete CYPomes revealed cases of both extreme expansion and contraction. The most dramatic expansions in fungi were observed in clans CYP58 and CYP68 (Pezizomycotina), clans CYP5150 and CYP63 (Agaricomycotina), and family CYP509 (Mucoromycotina). Although much of the extraordinary diversity of the pan-fungal CYPome can be attributed to gene duplication and adaptive divergence, our analysis also suggests a few potential horizontal gene transfer events. Updated families and clans can be accessed through the new version of the FCPD database. Conclusions: FCPD version 1.2 provides a systematic and searchable catalogue of 9,550 fungal CYP sequences (292 families) encoded by 108 fungal species and 147 CYP sequences (9 families) encoded by five oomycete species. In comparison to the first version, it offers a more comprehensive clan classification, is fully compatible with Nelson's P450 databases, and has expanded functional categorization. These features will facilitate functional annotation and classification of CYPs encoded by newly sequenced fungal and oomycete genomes. Additionally, the classification system will aid in studying the roles of CYPs in the evolution of fungal adaptation to specific ecological niches.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2012-10-05
    Description: Background: The relative contribution of epigenetic mechanisms to carcinogenesis is not well understood, including the extent to which epigenetic dysregulation and somatic mutations target similar genes and pathways. We hypothesize that during carcinogenesis, certain pathways or biological gene sets are commonly dysregulated via DNA methylation across cancer types. The ability of our logistic regression-based gene set enrichment method to implicate important biological pathways in high-throughput data is well established. Results: We developed a web-based gene set enrichment application called LRpath with clustering functionality that allows for identification and comparison of pathway signatures across multiple studies. Here, we employed LRpath analysis to unravel the commonly altered pathways and other gene sets across ten cancer studies employing DNA methylation data profiled with the Illumina HumanMethylation27 BeadChip. We observed a surprising level of concordance in differential methylation across multiple cancer types. For example, among commonly hypomethylated groups, we identified immune-related functions, peptidase activity, and epidermis/keratinocyte development and differentiation. Commonly hypermethylated groups included homeobox and other DNA-binding genes, nervous system and embryonic development, and voltage-gated potassium channels. For many gene sets, we observed significant overlap in the specific subset of differentially methylated genes. Interestingly, fewer DNA repair genes were differentially methylated than expected by chance. Conclusions: Clustering analysis performed with LRpath revealed tightly clustered concepts enriched for differential methylation. Several well-known cancer-related pathways were significantly affected, while others were depleted in differential methylation. We conclude that DNA methylation changes in cancer tend to target a subset of the known cancer pathways affected by genetic aberrations.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2012-10-06
    Description: Background: DNA methylation is a fundamental component of epigenetic modification, which is intimately involved in the regulation of gene expression. One important DNA methylation pathway reduces the abilities of transcription factors to bind to gene promoter regions. Although many experiments have been designed to measure genome-wide DNA methylation levels at high resolution, the meaning of these different DNA methylation levels on transcription factor binding abilities remains poorly understood. We have, therefore, developed a method to quantitatively explore the extent to which DNA methylation levels can significantly reduce or even abolish the binding of certain transcription factors, resulting in reduced or non-expression of flanking genes. This method allows transcription factors that are functionally active in gene expression to be investigated. Results: The method is based on a general model that depicts the relationship between DNA methylation and transcription factor binding ability based on intrinsic component properties, and the model parameters can be optimized through relative analysis of recognized transcription factor binding status and gene expression profiling. With fixed models, transcription factors functionally active in the regulation of gene expression and affected by epigenetic DNA methylation can be identified and subsequently confirmed. The method identified eleven apparently functionally active transcriptional factors in SH-SY5Y neuroblastoma cells. Conclusions: Compared with gene regulatory elements, epigenetic modifications are able to change to dynamically respond to signals from physical, biological and social environments. Our proposed method is therefore designed to provide a dynamic assessment to investigate functionally active transcription factors. With the information deduced from our method, we can predict transcription factor binding status in promoter regions to further investigate how a particular gene is regulated by a specific group of transcription factors organized in a particular pattern. This will be helpful in the diagnosis and development of treatment for numerous diseases, including cancer. Although the method only investigates DNA methylation, it has the potential to be applied to more epigenetic factors, such as histone modification.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2012-10-06
    Description: Background: Clinical Bioinformatics is currently growing and is based on the integration of clinical and omics data aiming at the development of personalized medicine. Thus the introduction of novel technologies able to investigate the relationship among clinical states and biological machineries may help the development of this field. For instance the Affymetrix DMET platform (drug metabolism enzymes and transporters) is able to study the relationship among the variation of the genome of patients and drug metabolism, detecting SNPs (Single Nucleotide Polymorphism) ongenes related to drug metabolism. This may allow for instance to find genetic variants in patients which present different drug responses, in pharmacogenomics and clinical studies. Despite this, there is currently a lack in the development of open-source algorithms and tools for the analysis of DMET data. Existing software tools for DMET data generally allow only the preprocessing of binary data (e.g. the DMET-Console provided by Affymetrix) and simple data analysis operations, but do not allow to test the association of the presence of SNPs with the response to drugs. Results: We developed DMET-Analyzer a tool for the automatic association analysis among the variation of the patient genomes and the clinical conditions of patients, i.e. the different response to drugs. The proposed system allows: (i) to automatize the workflow of analysis of DMET-SNP data avoiding the use of multiple tools; (ii) the automatic annotation of DMET-SNP data and the search in existing databases of SNPs (e.g. dbSNP), (iii) the association of SNP with pathway through the search in PharmaGKB, a major knowledge base for pharmacogenomic studies. DMET-Analyzer has a simple graphical user interface that allows users (doctors/biologists) to upload and analyse DMET files produced by Affymetrix DMET-Console in an interactive way. The effectiveness and easy use of DMET Analyzer is demonstrated through different case studies regarding the analysis of clinical datasets produced in the University Hospital of Catanzaro, Italy. Conclusion: DMET Analyzer is a novel tool able to automatically analyse data produced by the DMET-platform in case-control association studies. Using such tool user may avoid wasting time in the manual execution of multiple statistical tests avoiding possible errors and reducing the amount of time needed for a whole experiment. Moreover annotations and the direct link to external databases may increase the biological knowledge extracted. The system is freely available for academic purposes at: https://sourceforge.net/projects/dmetanalyzer/files/
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2012-10-06
    Description: Background: The broad ecological distribution of L. casei makes it an insightful subject for research on genome evolution and lifestyle adaptation. To explore evolutionary mechanisms that determine genomic diversity of L. casei, we performed comparative analysis of 17 L. casei genomes representing strains collected from dairy, plant, and human sources. Results: Differences in L. casei genome inventory revealed an open pan-genome comprised of 1,715 core and 4,220 accessory genes. Extrapolation of pan-genome data indicate L. casei has a supragenome approximately 3.2 times larger than the average genome of individual strains. Evidence suggests horizontal gene transfer from other bacterial species, particularly lactobacilli, has been important in adaptation of L. casei to new habitats and lifestyles, but evolution of dairy niche specialists also appears to involve gene decay. Conclusions: Genome diversity in L. casei has evolved through gene acquisition and decay. Acquisition of foreign genomic islands likely confers a fitness benefit in specific habitats, notably plant-associated niches. Loss of unnecessary ancestral traits in strains collected from bacterial-ripened cheeses support the hypothesis that gene decay contributes to enhanced fitness in that niche. This study gives the first evidence for a L. casei supragenome and provides valuable insights into mechanisms for genome evolution and lifestyle adaptation of this ecologically flexible and industrially important lactic acid bacterium. Additionally, our data confirm the Distributed Genome Hypothesis extends to non-pathogenic, ecologically flexible species like L. casei.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2012-10-06
    Description: Background: Trypanosoma cruzi marinkellei is a bat-associated parasite of the subgenus Schizotrypanum and it is regarded as a T. cruzi subspecies. Here we report a draft genome sequence of T. c. marinkellei and comparison with T. c. cruzi. Our aims were to identify unique sequences and genomic features, which may relate to their distinct niches. Results: The T. c. marinkellei genome was found to be ~11% smaller than that of the human-derived parasite T. c. cruzi Sylvio X10. The genome size difference was attributed to copy number variation of coding and non-coding sequences. The sequence divergence in coding regions was ~7.5% between T. c. marinkellei and T. c. cruzi Sylvio X10. A unique acetyltransferase gene was identified in T. c. marinkellei, representing an example of a horizontal gene transfer from eukaryote to eukaryote. Six of eight examined gene families were expanded in T. c. cruzi Sylvio X10. The DGF gene family was expanded in T. c. marinkellei. T. c. cruzi Sylvio X10 contained ~1.5 fold more sequences related to VIPER and L1Tc elements. Experimental infections of mammalian cell lines indicated that T. c. marinkellei has the capacity to invade non-bat cells and undergo intracellular replication. Conclusions: Several unique sequences were identified in the comparison, including a potential subspecies-specific gene acquisition in T. c. marinkellei. The identified differences reflect the distinct evolutionary trajectories of these parasites and represent targets for functional investigation.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2012-08-01
    Description: Background: The Hedgehog Signaling Pathway is one of signaling pathways that are very important toembryonic development. The participation of inhibitors in the Hedgehog Signal Pathway cancontrol cell growth and death, and searching novel inhibitors to the functioning of thepathway are in a great demand. As the matter of fact, effective inhibitors could provideefficient therapies for a wide range of malignancies, and targeting such pathway in cellsrepresents a promising new paradigm for cell growth and death control. Current researchmainly focuses on the syntheses of the inhibitors of cyclopamine derivatives, which bindspecifically to the Smo protein, and can be used for cancer therapy. While quantitativelystructure-activity relationship (QSAR) studies have been performed for these compounds among different cell lines, none of them have achieved acceptable results in the prediction ofactivity values of new compounds. In this study, we proposed a novel collaborative QSARmodel for inhibitors of the Hedgehog Signaling Pathway by integration the information frommultiple cell lines. Such a model is expected to substantially improve the QSAR ability fromsingle cell lines, and provide useful clues in developing clinically effective inhibitors andmodifications of parent lead compounds for target on the Hedgehog Signaling Pathway. Results: In this study, we have presented: (1) a collaborative QSAR model, which is used to integrateinformation among multiple cell lines to boost the QSAR results, rather than only a singlecell line QSAR modeling. Our experiments have shown that the performance of our model issignificantly better than single cell line QSAR methods; and (2) an efficient feature selectionstrategy under such collaborative environment, which can derive the commonly importantfeatures related to the entire given cell lines, while simultaneously showing their specificcontributions to a specific cell-line. Based on feature selection results, we have proposedseveral possible chemical modifications to improve the inhibitor affinity towards multipletargets in the Hedgehog Signaling Pathway. Conclusions: Our model with the feature selection strategy presented here is efficient, robust, and flexible,and can be easily extended to model large-scale multiple cell line/QSAR data. The data andscripts for collaborative QSAR modeling are available in the Additional file 1.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2012-08-01
    Description: High throughput gene expression technologies are a popular choice for researchers seeking molecular or systems-level explanations of biological phenomena. Nevertheless, there has been a groundswell of opinion that these approaches have not lived up to the hype because the interpretation of the data has lagged behind its generation. In our view a major problem has been an over-reliance on isolated lists of differentially expressed (DE) genes which - by simply comparing genes to themselves - have the pitfall of taking molecular information out of context. Numerous scientists have emphasised the need for better context. This can be achieved through holistic measurements of differential connectivity in addition to, or in replacement, of DE. However, many scientists continue to use isolated lists of DE genes as the major source of input data for common readily available analytical tools. Focussing this opinion article on our own research in skeletal muscle, we outline our resolutions to these problems - particularly a universally powerful way of quantifying differential connectivity. With a well designed experiment, it is now possible to use gene expression to identify causal mutations and the other major effector molecules with whom they cooperate, irrespective of whether they themselves are DE. We explain why, for various reasons, no other currently available experimental techniques or quantitative analyses are capable of reaching these conclusions.
    Electronic ISSN: 1471-2164
    Topics: Biology
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...