-
1دورية أكاديمية
المؤلفون: Xiao Yang, Shyamasree Saha, Aravind Venkatesan, Santosh Tirunagari, Vid Vartak, Johanna McEntyre
المصدر: Scientific Data, Vol 10, Iss 1, Pp 1-13 (2023)
مصطلحات موضوعية: Science
الوصف: Abstract Named entity recognition (NER) is a widely used text-mining and natural language processing (NLP) subtask. In recent years, deep learning methods have superseded traditional dictionary- and rule-based NER approaches. A high-quality dataset is essential to fully leverage recent deep learning advancements. While several gold-standard corpora for biomedical entities in abstracts exist, only a few are based on full-text research articles. The Europe PMC literature database routinely annotates Gene/Proteins, Diseases, and Organisms entities. To transition this pipeline from a dictionary-based to a machine learning-based approach, we have developed a human-annotated full-text corpus for these entities, comprising 300 full-text open-access research articles. Over 72,000 mentions of biomedical concepts have been identified within approximately 114,000 sentences. This article describes the corpus and details how to access and reuse this open community resource.
وصف الملف: electronic resource
Relation: https://doaj.org/toc/2052-4463
-
2
المؤلفون: David Ochoa, Andrew Hercules, Miguel Carmona, Daniel Suveges, Jarrod Baker, Cinzia Malangone, Irene Lopez, Alfredo Miranda, Carlos Cruz-Castillo, Luca Fumis, Manuel Bernal-Llinares, Kirill Tsukanov, Helena Cornu, Konstantinos Tsirigos, Olesya Razuvayevskaya, Annalisa Buniello, Jeremy Schwartzentruber, Mohd Karim, Bruno Ariano, Ricardo Esteban Martinez Osorio, Javier Ferrer, Xiangyu Ge, Sandra Machlitt-Northen, Asier Gonzalez-Uriarte, Shyamasree Saha, Santosh Tirunagari, Chintan Mehta, Juan María Roldán-Romero, Stuart Horswell, Sarah Young, Maya Ghoussaini, David G Hulcoop, Ian Dunham, Ellen M McDonagh
المصدر: Nucleic Acids Research. 51:D1353-D1359
مصطلحات موضوعية: Genetics
الوصف: The Open Targets Platform (https://platform.opentargets.org/) is an open source resource to systematically assist drug target identification and prioritisation using publicly available data. Since our last update, we have reimagined, redesigned, and rebuilt the Platform in order to streamline data integration and harmonisation, expand the ways in which users can explore the data, and improve the user experience. The gene–disease causal evidence has been enhanced and expanded to better capture disease causality across rare, common, and somatic diseases. For target and drug annotations, we have incorporated new features that help assess target safety and tractability, including genetic constraint, PROTACtability assessments, and AlphaFold structure predictions. We have also introduced new machine learning applications for knowledge extraction from the published literature, clinical trial information, and drug labels. The new technologies and frameworks introduced since the last update will ease the introduction of new features and the creation of separate instances of the Platform adapted to user requirements. Our new Community forum, expanded training materials, and outreach programme support our users in a range of use cases.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3dffeff31dcc4de4b454719a05269273
https://doi.org/10.1093/nar/gkac1046 -
3دورية أكاديمية
المؤلفون: Shyamasree Saha, Anirban Dutta
المصدر: International Management Institute, Global Business Review. 20(4):1010-1025
الوصف: The Indian insurance industry has gone through major transformations over the past two decades and life insurance sector has been no exception to it. With the introduction of privatization and the increase in number of insurance providers, the competition has risen unprecedentedly. To survive this immense competition, the companies have come up with numerous products and solutions to lure the customers. Other factors like increased coverage of lives/property, diverse customer-friendly products and rapid growth of multiple channels, like agency, bancassurance, etc., are gradually gaining momentum to intensify the level of competitiveness in the market. But unfortunately, this has somehow compromised the quality of service that is being delivered to the customers, and no attempt has been made to measure and ensure the service quality of this industry. This research aims to identify the determinant factors that contribute a substantial role in sculpting customers’ perception of service quality in Indian life insurance sector. Six factors were identified after the whole data mining process (using factor analysis) which were considered to be as the ‘Factors Influencing of Service Quality Perception in Indian Life Insurance Sector’.
-
4
المؤلفون: Shyamasree Saha, David A. Matthews, Conrad Bessant
المصدر: Proteome Informatics ISBN: 9781782624288
Proteome Informaticsمصطلحات موضوعية: Genetics, Transcriptome, Matching (statistics), Sequence database, Protein database, Computational biology, Biology, Proteomics, Multiple species
الوصف: The choice of protein sequence database used for peptide spectrum matching has a major impact on the extent and significance of protein identifications obtained in a given experiment. Finding a suitable database can be a major challenge, particularly when working with non-model organisms and complex samples containing proteins from multiple species. This chapter introduces the proteomics informed by transcriptomics (PIT) methodology, in which RNA-seq transcriptomics is used to generate a sample-specific protein database against which proteomic mass spectra can be searched. This approach extends the application of proteomics to studies in which it was not previously tractable, and is well suited to the discovery of novel translated genomic elements.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::3471f78e625bc744e920c7446c247ca1
https://doi.org/10.1039/9781782626732-00385 -
5
المصدر: Saha, S, Chatzimichali, E A, Matthews, D A & Bessant, C 2018, ' PITDB : a database of translated genomic elements ', Nucleic Acids Research, vol. 46, no. D1, gkx906, pp. D1223-D1228 . https://doi.org/10.1093/nar/gkx906
Nucleic Acids Researchمصطلحات موضوعية: 0301 basic medicine, Proteomics, Databases, Factual, Biology, computer.software_genre, Genome, 03 medical and health sciences, Open Reading Frames, User-Computer Interface, Protein sequencing, Tandem Mass Spectrometry, Genetics, Database Issue, Animals, Humans, Protein Isoforms, natural sciences, Amino Acid Sequence, Exact match, Internet, Database, Sequence Analysis, RNA, Proteins, 030104 developmental biology, Protein Biosynthesis, Proteome, Data Display, Splice isoforms, UniProt, computer, Algorithms
الوصف: PITDB is a freely available database of translated genomic elements (TGEs) that have been observed in PIT (proteomics informed by transcriptomics) experiments. In PIT, a sample is analyzed using both RNA-seq transcriptomics and proteomic mass spectrometry. Transcripts assembled from RNA-seq reads are used to create a library of sample-specific amino acid sequences against which the acquired mass spectra are searched, permitting detection of any TGE, not just those in canonical proteome databases. At the time of writing, PITDB contains over 74 000 distinct TGEs from four species, supported by more than 600 000 peptide spectrum matches. The database, accessible via http://pitdb.org, provides supporting evidence for each TGE, often from multiple experiments and an indication of the confidence in the TGE’s observation and its type, ranging from known protein (exact match to a UniProt protein sequence), through multiple types of protein variant including various splice isoforms, to a putative novel molecule. PITDB’s modern web interface allows TGEs to be viewed individually or by species or experiment, and downloaded for further analysis. PITDB is for bench scientists seeking to share their PIT results, for researchers investigating novel genome products in model organisms and for those wishing to construct proteomes for lesser studied species.
وصف الملف: application/pdf
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9bfe1356fac482508286027c0e83be32
https://pubmed.ncbi.nlm.nih.gov/30053269 -
6
المصدر: Bioinformatics
BASE-Bielefeld Academic Search Engineمصطلحات موضوعية: Statistics and Probability, Document Structure Description, Support Vector Machine, Computer science, Context (language use), 02 engineering and technology, Biochemistry, Scientific discourse, Pattern Recognition, Automated, 03 medical and health sciences, Annotation, Documentation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Molecular Biology, 030304 developmental biology, 0303 health sciences, Internet, Information retrieval, Conceptualization, Biomedical information, Original Papers, Computer Science Applications, Computational Mathematics, Computational Theory and Mathematics, Pattern recognition (psychology), 020201 artificial intelligence & image processing, Data and Text Mining, Periodicals as Topic, Sentence, Algorithms, Software
الوصف: Motivation: Scholarly biomedical publications report on the findings of a research investigation. Scientists use a well-established discourse structure to relate their work to the state of the art, express their own motivation and hypotheses and report on their methods, results and conclusions. In previous work, we have proposed ways to explicitly annotate the structure of scientific investigations in scholarly publications. Here we present the means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which we call Core Scientific Concepts (CoreSCs). These include: Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion. CoreSCs provide the structure and context to all statements and relations within an article and their automatic recognition can greatly facilitate biomedical information extraction by characterizing the different types of facts, hypotheses and evidence available in a scientific publication. Results: We have trained and compared machine learning classifiers (support vector machines and conditional random fields) on a corpus of 265 full articles in biochemistry and chemistry to automatically recognize CoreSCs. We have evaluated our automatic classifications against a manually annotated gold standard, and have achieved promising accuracies with ‘Experiment’, ‘Background’ and ‘Model’ being the categories with the highest F1-scores (76%, 62% and 53%, respectively). We have analysed the task of CoreSC annotation both from a sentence classification as well as sequence labelling perspective and we present a detailed feature evaluation. The most discriminative features are local sentence features such as unigrams, bigrams and grammatical dependencies while features encoding the document structure, such as section headings, also play an important role for some of the categories. We discuss the usefulness of automatically generated CoreSCs in two biomedical applications as well as work in progress. Availability: A web-based tool for the automatic annotation of articles with CoreSCs and corresponding documentation is available online at http://www.sapientaproject.com/softwarehttp://www.sapientaproject.com also contains detailed information pertaining to CoreSC annotation and links to annotation guidelines as well as a corpus of manually annotated articles, which served as our training data. Contact: liakata@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3b15c8cc17e3acd76af1c8f81aec0647
http://europepmc.org/articles/PMC3315721 -
7
المؤلفون: Jun, Fan, Shyamasree, Saha, Gary, Barker, Kate J, Heesom, Fawaz, Ghali, Andrew R, Jones, David A, Matthews, Conrad, Bessant
المصدر: Molecular & Cellular Proteomics : MCP
مصطلحات موضوعية: Proteomics, ComputingMethodologies_PATTERNRECOGNITION, Technological Innovation and Resources, Data Mining, Humans, Databases, Protein, Transcriptome, Algorithms, Mass Spectrometry, Software, Workflow
الوصف: With the recent advent of RNA-seq technology the proteomics community has begun to generate sample-specific protein databases for peptide and protein identification, an approach we call proteomics informed by transcriptomics (PIT). This approach has gained a lot of interest, particularly among researchers who work with nonmodel organisms or with particularly dynamic proteomes such as those observed in developmental biology and host-pathogen studies. PIT has been shown to improve coverage of known proteins, and to reveal potential novel gene products. However, many groups are impeded in their use of PIT by the complexity of the required data analysis. Necessarily, this analysis requires complex integration of a number of different software tools from at least two different communities, and because PIT has a range of biological applications a single software pipeline is not suitable for all use cases. To overcome these problems, we have created GIO, a software system that uses the well-established Galaxy platform to make PIT analysis available to the typical bench scientist via a simple web interface. Within GIO we provide workflows for four common use cases: a standard search against a reference proteome; PIT protein identification without a reference genome; PIT protein identification using a genome guide; and PIT genome annotation. These workflows comprise individual tools that can be reconfigured and rearranged within the web interface to create new workflows to support additional use cases.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=pmid________::295b2a50a7be713750c1952075d71f91
https://pubmed.ncbi.nlm.nih.gov/26269333 -
8
المؤلفون: Maaly Nassar, Xingjun Pi, Xiao Yang, Francesco Talo, Christine Ferguson, Arthur Thouvenin, Yogmatee Roochun, David Stephenson, Vid Vartak, Shrey Sharma, Dayane Araújo, Lynne Faulk, Yuci Gou, Maria Levchenko, Santosh Tirunagari, Shyamasree Saha, Rakesh Nambiar, Michael Parkin, Frances Rogers, Audrey Hamelers, Mohamed Selim, Johanna McEntyre, Nikos Marinos, Zhan Huang, Michele Ide-Smith, Faisal Rahman, Aravind Venkatesan, Zunaira Shafique
المصدر: Nucleic Acids Research
مصطلحات موضوعية: PubMed, 2019-20 coronavirus outbreak, Biomedical Research, Databases, Factual, AcademicSubjects/SCI00010, Download, MEDLINE, Biology, computer.software_genre, Biological Science Disciplines, News aggregator, World Wide Web, 03 medical and health sciences, 0302 clinical medicine, Genetics, Data Mining, Humans, Database Issue, Epidemics, Biological sciences, Data Curation, 030304 developmental biology, Internet, 0303 health sciences, Data curation, SARS-CoV-2, business.industry, COVID-19, Data resources, Europe, The Internet, business, computer, 030217 neurology & neurosurgery
الوصف: Europe PMC (https://europepmc.org) is a database of research articles, including peer reviewed full text articles and abstracts, and preprints - all freely available for use via website, APIs and bulk download. This article outlines new developments since 2017 where work has focussed on three key areas: (i) Europe PMC has added to its core content to include life science preprint abstracts and a special collection of full text of COVID-19-related preprints. Europe PMC is unique as an aggregator of biomedical preprints alongside peer-reviewed articles, with over 180 000 preprints available to search. (ii) Europe PMC has significantly expanded its links to content related to the publications, such as links to Unpaywall, providing wider access to full text, preprint peer-review platforms, all major curated data resources in the life sciences, and experimental protocols. The redesigned Europe PMC website features the PubMed abstract and corresponding PMC full text merged into one article page; there is more evident and user-friendly navigation within articles and to related content, plus a figure browse feature. (iii) The expanded annotations platform offers ∼1.3 billion text mined biological terms and concepts sourced from 10 providers and over 40 global data resources.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::905302a0f5b2c7f4f09c7fd2427e900d
-
9دورية أكاديمية
المؤلفون: Saha, Shyamasree
المساهمون: Dutta, Anirban (VerfasserIn)
المصدر: Global business review : New Delhi [u.a.], vol 20, issue 4, pg. 1010-1025. 08/2019