Textrous! Extracting semantic textual meaning from gene sets

Chen, Hongyu; Martin, Bronwen; Daimon, Caitlin M.; Siddiqui, Sana; Luttrell, Louis M.; Maudsley, Stuart

doi:10.1371/JOURNAL.PONE.0062665

Title

Textrous! Extracting semantic textual meaning from gene sets

Author

Chen, Hongyu

Martin, Bronwen

Daimon, Caitlin M.

Siddiqui, Sana

Luttrell, Louis M.

Maudsley, Stuart

Abstract

The un-biased and reproducible interpretation of high-content gene sets from large-scale genomic experiments is crucial to the understanding of biological themes, validation of experimental data, and the eventual development of plans for future experimentation. To derive biomedically-relevant information from simple gene lists, a mathematical association to scientific language and meaningful words or sentences is crucial. Unfortunately, existing software for deriving meaningful and easily-appreciable scientific textual 'tokens' from large gene sets either rely on controlled vocabularies (Medical Subject Headings, Gene Ontology, BioCarta) or employ Boolean text searching and co-occurrence models that are incapable of detecting indirect links in the literature. As an improvement to existing web-based informatic tools, we have developed Textrous!, a web-based framework for the extraction of biomedical semantic meaning from a given input gene set of arbitrary length. Textrous! employs natural language processing techniques, including latent semantic indexing (LSI), sentence splitting, word tokenization, parts-of-speech tagging, and noun-phrase chunking, to mine MEDLINE abstracts, PubMed Central articles, articles from the Online Mendelian Inheritance in Man (OMIM), and Mammalian Phenotype annotation obtained from Jackson Laboratories. Textrous! has the ability to generate meaningful output data with even very small input datasets, using two different text extraction methodologies (collective and individual) for the selecting, ranking, clustering, and visualization of English words obtained from the user data. Textrous!, therefore, is able to facilitate the output of quantitatively significant and easily appreciable semantic words and phrases linked to both individual gene and batch genomic data.

Language

English

Source (journal)

PLoS ONE

Publication

2013

ISSN

1932-6203

DOI

10.1371/JOURNAL.PONE.0062665

Volume/pages

8 :4 (2013) , 11 p.

Article Reference

e62665

ISI

000319077300087

Medium

E-only publicatie

Full text (Publisher's DOI)

https://doi.org/10.1371/JOURNAL.PONE.0062665

Full text (open access)

https://repository.uantwerpen.be/docman/irua/7b0898/8893.pdf

Faculty/Department				Faculty of Pharmaceutical, Biomedical and Veterinary Sciences . Biomedical Sciences

Research group
Publication type				A1 Journal article

Subject				Engineering sciences. Technology

Web of Science

View record in Web of Science®

View citing articles in Web of Science®

Identifier

Creation

06.01.2015

Last edited

04.03.2024

To cite this reference

https://hdl.handle.net/10067/1214990151162165141