Skip to content Skip to footer

Tools and resources list

This page captures all tools and resources mentioned across the Infectious Diseases Toolkit.

Tool or resource Description Related pages Registry
1+ Million Genomes (1+MG) The 1+ Million Genomes (1+MG) initiative aims to enable secure access to genomics and the corresponding clinical data across Europe for better research, personalised healthcare and health policy making. Since the Digital Day 2018, 25 EU countries, the UK and Norway signed Member States declaration on stepping up efforts towards creating a European data infrastructure for genomic data and implementing common national rules enabling federated data access. The initiative forms part of the EU's agenda for the Digital Transformation of Health and Care and is aligned with the goals of the European Health Data Space. Training
ACE Cohort Asymptomatic COVID-19 in Education (ACE) Cohort Human biomolecular data Training
ADA-M Responsible sharing of biomedical data and biospecimens via the Automatable Discovery and Access Matrix (ADA-M). The Automatable Discovery and Access Matrix (ADA-M) provides a standardized way to unambiguously represent the conditions related to data discovery and access. By adopting ADA-M, data custodians can generally describe what their data are (the Header section), who can access them (the Permissions section), terms related to their use (the Terms section), and special conditions (the Meta-Conditions). By doing so, data custodians can participate in data sharing and collaboration by making meta information about their data computer-readable and hence directly available for digital communication, searching and automation activities. Human biomolecular data Tool info
ANNOVAR ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes. Human biomolecular data Pathogen characterisation Tool info
apex Absolute protein expression Quantitative Proteomics Tool, is a free and open source Java implementation of the APEX technique for the quantitation of proteins based on standard LC- MS/MS proteomics data. Pathogen characterisation Tool info
ArrayExpress ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies. Data is collected to MIAME and MINSEQE standards. Experiments are submitted directly to ArrayExpress or are imported from the NCBI GEO database. Human biomolecular data Linked pathogen and ho... Tool info Standards/Databases Training
Arvados With Arvados, bioinformaticians run and scale compute-intensive workflows, developers create biomedical applications, and IT administrators manage large compute and storage resources. Human biomolecular data
Bcftools Bcftools is a set of tools for working with variant calls in the VCF format. Human biomolecular data Tool info
Beacon v2 Beacon v2 is a protocol/specification established by the Global Alliance for Genomics and Health initiative (GA4GH) that defines an open standard for federated discovery of genomic data and associated information in biomedical research and clinical applications. Human biomolecular data Human clinical and hea... Tool info Standards/Databases Training
BEAST BEAST is a cross-platform program for Bayesian phylogenetic analysis, estimating rooted, time-measured phylogenies using strict or relaxed molecular clock models. It uses Markov chain Monte Carlo (MCMC) to average over tree space and includes a graphical user interface for setting up analyses and tools for result analysis. Pathogen characterisation Tool info
BEAUti BEAUti is a graphical user-interface (GUI) application for generating BEAST XML files. Pathogen characterisation
Bento platform The Bento platform enables the research community to explore the BQC19 cohort aggregate data. Human biomolecular data
Beyond 1 Million Genomes (B1MG) The Beyond 1 Million Genomes (B1MG) project is helping to create a network of genetic and clinical data across Europe. The project provides coordination and support to the 1+ Million Genomes Initiative (1+MG). This initiative is a commitment of 24 EU countries, the UK and Norway to give cross-border access to one million sequenced genomes by 2022.
BioGRID BioGRID is a comprehensive biomedical repository for curated protein, genetic and chemical interactions Human biomolecular data Pathogen characterisation Tool info Standards/Databases
BioPortal A comprehensive repository of biomedical ontologies Human clinical and hea... Tool info Standards/Databases Training
BioSamples BioSamples stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. Samples are either 'reference' samples (e.g. from 1000 Genomes, HipSci, FAANG) or have been used in an assay database such as the European Nucleotide Archive (ENA) or ArrayExpress. It provides links to assays and specific samples, and accepts direct submissions of sample information. Human biomolecular data Linked pathogen and ho... Tool info Standards/Databases Training
BioStudies The BioStudies database holds descriptions of biological studies, links to data from these studies in other databases at EMBL-EBI or outside, as well as data that do not fit in the structured archives at EMBL-EBI. The database can accept a wide range of types of studies described via a simple format. It also enables manuscript authors to submit supplementary information and link to it from the publication. Linked pathogen and ho... Tool info Standards/Databases Training
Bismark Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. Human biomolecular data Tool info Training
Bitbucket Git based code hosting and collaboration tool, built for teams. Human biomolecular data Pathogen characterisation Standards/Databases
Bowtie2 Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Human biomolecular data Pathogen characterisation Tool info Training
BWA BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. Human biomolecular data Pathogen characterisation Tool info Training
C-WAP CFSAN Wastewater Analysis Pipeline to estimate the percentage of SARS-CoV-2 variants in a sample. Pathogen characterisation
camflow CamFlow is a Linux Security Module (LSM) designed to capture data provenance for the purpose of system audit General guidelines
Canu Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing. Human biomolecular data Pathogen characterisation Tool info
CellDesigner CellDesigner is a structured diagram editor for drawing gene-regulatory and biochemical networks. Pathogen characterisation
CESSDA Data Catalogue (CDC) CDC is a one-stop shop for searching and finding European social science data. Socioeconomic data Tool info Standards/Databases
CESSDA Vocabulary Service CESSDA Vocabulary Service enables users to discover, browse, and download controlled vocabularies in a variety of languages. The service is provided by the Consortium of European Social Science Data Archives (CESSDA). The majority of the source (English) vocabularies included in the service have been created by the DDI Alliance. The Data Documentation Initiative (DDI) is an international standard for describing data produced by surveys and other observational methods in the social, behavioural, economic, and health sciences. Standards/Databases
Chenomx A commercial software package for NMR spectral processing that offers a semi-automated tool for spectral deconvolution, enabling interactive fitting of metabolite peaks to reference spectra and quantifying their concentrations. Pathogen characterisation
ClustalW ClustalW is a progressive multiple sequence alignment tool to align a set of sequences by repeatedly aligning pairs of sequences and previously generated alignments. Human biomolecular data Pathogen characterisation Tool info Training
COJAC The cojac package comprises a set of command-line tools to analyse co-occurrence of mutations on amplicons. Pathogen characterisation Tool info
Common Workflow Language (CWL) An open standard for describing workflows that are build from command line tools General guidelines Human clinical and hea... Standards/Databases Training
COMPSs COMP Superscalar (COMPSs) is a task-based programming model which aims to ease the development of applications for distributed infrastructures, such as large High-Performance clusters (HPC), clouds and container managed clusters. General guidelines Human clinical and hea... Tool info
COVID-19 BEACON The COVID-19 Beacon is a searchable platform for SARS-CoV-2 genomic variants conforming to the Beacon specifications of the Global Alliance for Genomics and Health but adjusted for viral genome searches. Human biomolecular data
COVID-19 Data Portal The COVID-19 Data Portal enables researchers to upload, access and analyse COVID-19 related reference data and specialist datasets. The aim of the COVID-19 Data Portal is to facilitate data sharing and analysis, and to accelerate coronavirus research. The portal includes relevant datasets submitted to EMBL-EBI as well as other major centres for biomedical data. The COVID-19 Data Portal is the primary entry point into the functions of a wider project, the European COVID-19 Data Platform. Human biomolecular data Human clinical and hea... Socioeconomic data The Swedish Pathogens ... Tool info Standards/Databases Training
COVID19 Disease Map The COVID-19 Disease Map is an assembly of molecular interaction diagrams, established based on literature evidence. Pathogen characterisation
COWWID A GitHub repository from the CBG-ETHZ group offering tools for detecting SARS-CoV-2 variants in Switzerland.
CRG COVID-19 Viral Beacon A platform allowing for browsing SARS-CoV-2 variability at the genome, amino acid, structural, and motif levels Human biomolecular data An automated SARS-CoV-...
Cromwell Cromwell is a Workflow Management System geared towards scientific workflows. Human biomolecular data
cwltool Reference implementation to provide comprehensive validation of CWL files as well as provide other tools related to working with CWL. Human biomolecular data
Cytoscape Cytoscape provides a solid platform for network visualization and analysis Human biomolecular data Pathogen characterisation Tool info Training
DAGitty DAGitty is a browser-based environment for creating, editing, and analyzing causal diagrams (also known as directed acyclic graphs or causal Bayesian networks). Prototyping federated ... Tool info Standards/Databases
Danish Research Health Data Gateway Tool that provides the user with an overview of available health data in Denmark and the entire application process, from initial idea to final application. Human clinical and hea...
Data Structure Wizard (DSW) The Data Structure Wizard (DSW) is a Java standalone desktop application that supports version 2.0 & 2.1 of the SDMX standard. Socioeconomic data
DAVID The Database for Annotation, Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand the biological meaning behind large lists of genes. Human biomolecular data Tool info Training
dbGaP The Database of Genotypes and Phenotypes (dbGaP) archives and distributes the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. Human biomolecular data Tool info Standards/Databases Training
dbNSFP A comprehensive database of transcript-specific functional predictions and annotations for human non-synonymous and splice-site SNVs Human biomolecular data Pathogen characterisation Tool info
DCAT An RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. Human clinical and hea... Human biomolecular data Standards/Databases
DDI Tools Searchable list of tools available to help you work with DDI, from authoring and editing metadata to data transformations. The tools have been developed independently by a variety of organizations from the global DDI community.
dedupe Python library that uses machine learning to perform fuzzy matching, deduplication, and entity resolution quickly on structured data. Socioeconomic data
DeepVariant DeepVariant is a deep learning-based variant caller that takes aligned reads (in BAM or CRAM format), produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports the results in a standard VCF or gVCF file. Human biomolecular data Tool info
Delly Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read and long-read massively parallel sequencing data. Human biomolecular data Tool info
DESeq2 Differential gene expression analysis based on the negative binomial distribution Human biomolecular data Tool info Training
Docker Docker is a software for the execution of applications in virtualized environments called containers. It is linked to DockerHub, a library for sharing container images Human biomolecular data Standards/Databases Standards/Databases Training
dplyr dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges. Socioeconomic data Training
Dragen-GATK DRAGEN-GATK Best Practices contains open-source workflows that are compatible between Illumina's platforms and mainstream infrastructure. Human biomolecular data Pathogen characterisation
Dryad Dryad is an open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data. Human biomolecular data Standards/Databases
Dutch COVID-19 Data Portal The dutch COVID-19 Data Portal provides researchers with a clear overview of what is available, allow searching for specific data and make access to such data easier when the necessary ethical and legal conditions have been met. Human biomolecular data
EBI The European Bioinformatics Institute is a bioinformatics research center that is part of the European Molecular Biology Laboratory and is located in Hinxton, England. The institution combines intense research activity with the development and maintenance of a set of bioinformatics lines, services and databases. Human biomolecular data Training
EdgeR Empirical Analysis of Digital Gene Expression Data in R Human biomolecular data Tool info Training
EGA Beacon Interface to query on the EGA data through Beacon v2 Human biomolecular data
ENA upload CLI Command line tool (CLI) allowing easy submission of data and respective metadata to the European Nucleotide Archive (ENA) using tabular files or an excel spreadsheet. The tool allows programatic submission of all ENA objects (study, sample, run and experiment) without the need of logging in to the Webin interface. This also includes client side validation using ENA checklists and releasing the ENA objects. Using the ENA data sub... SARS-CoV-2 sequencing ...
ENA upload Galaxy tool Galaxy tool wrapper of the ENA upload CLI to submit experimental data and respective metadata to the European Nucleotide Archive (ENA). Using the ENA data sub... SARS-CoV-2 sequencing ...
ENA Webin CLI Galaxy wrapper to submit consensus sequences to ENA in an interactive way. The tool has the Webin-CLI script of ENA at its core and supports all sample checklists. Using the ENA data sub...
Enrichr Functional Enrichment Analysis and Network Construction Pathogen characterisation Tool info
Estonian Biobank The Estonian Biobank has established a population-based biobank of Estonia with a current cohort size of more than 200,000 individuals (genotyped with genome-wide arrays), reflecting the age, sex and geographical distribution of the adult Estonian population. Considering the fact that about 20% of Estonia's adult population has joined the programme, it is indeed a database that is very important for the development of medical science both domestically and internationally. Human biomolecular data
Estonian COVID-19 Data Portal Estonian instance of the COVID-19 Data Portal. Among other information, served Estonian SARS-CoV-2 sequencing dashboards. SARS-CoV-2 sequencing ... The Swedish Pathogens ...
EUI COVID-19 SSH Data Portal The COVID-19 SSH Data Portal provides integrated search, discovery, and linking to datasets published on the web relevant for COVID-19-related research in the Social Sciences and Humanities. Socioeconomic data Standards/Databases
EuroHPC EuroHPC Joint Undertaking is a joint initiative between the EU, European countries and private partners to develop a World Class Supercomputing Ecosystem in Europe. Pathogen characterisation
European Centre for Disease Prevention and Control (ECDC) It is an EU agency aimed at strengthening Europe's defences against infectious diseases. Their mission is to identify, assess and communicate current and emerging threats to human health posed by infectious diseases. Human clinical and hea...
European Clinical Research Infrastructure Network (ECRIN) tools ECRIN develops, contributes to, and maintains freely accessible tools that facilitate the identification of clinical trial objects, data sharing, access to regulatory and methodological designs and much more to support researchers looking to conduct multinational clinical research. Human clinical and hea...
European Genome-phenome Archive (EGA) The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of personally identifiable genetic, phenotypic, and clinical data generated for the purposes of biomedical research projects or in the context of research-focused healthcare systems. Access to data must be approved by the specified Data Access Committee (DAC). Human biomolecular data Human clinical and hea... Linked pathogen and ho... Tool info Standards/Databases Training
European Health Data Space (EHDS) The European Health Data Space is a health specific ecosystem comprised of rules, common standards and practices, infrastructures and a governance framework that aims at empowering individuals through increased digital access to and control of their electronic personal health data, at national level and EU-wide. Human clinical and hea... Human clinical and hea...
European Health Information Portal The Health Information Portal provides access to population health and healthcare data across Europe. Human clinical and hea... Standards/Databases
European Language Social Science Thesaurus (ELSST) The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers. Standards/Databases
European Medicines Agency (EMA) The European Medicines Agency (EMA) is a decentralised agency of the European Union (EU). It is responsible for the scientific evaluation, supervision and safety monitoring of medicines. Human clinical and hea...
European Nucleotide Archive (ENA) Provides a record of the nucleotide sequencing information. It includes raw sequencing data, sequence assembly information and functional annotation. Pathogen characterisation Human clinical and hea... Pathogen characterisation Human biomolecular data An automated SARS-CoV-... Using the ENA data sub... SARS-CoV-2 sequencing ... Linked pathogen and ho... Tool info Standards/Databases Training
FAIRsharing FAIRsharing is a FAIR-supporting resource that provides an informative and educational registry on data standards, databases, repositories and policy, alongside search and visualization tools and services that interoperate with other FAIR-enabling resources. FAIRsharing guides consumers to discover, select and use standards, databases, repositories and policy with confidence, and producers to make their resources more discoverable, more widely adopted and cited. Each record in fairsharing is curated in collaboration with the maintainers of the resource themselves, ensuring that the metadata in the fairsharing registry is accurate and timely. Pathogen characterisation Human biomolecular data Ethical, Legal, and So... Standards/Databases Training
FASTQC A quality control tool for high throughput sequence data. Pathogen characterisation Human biomolecular data Pathogen characterisation Tool info Training
FastQC Screen FastQ Screen is a quality control tool used to detect contamination in sequencing data (FASTQ files). Human biomolecular data
Federated EGA The Federated EGA is an infrastructure built upon the European Genome-phenome Archive (EGA), an EMBL-EBI and CRG data resource for secure archiving and sharing of human sensitive biomolecular and phenotypic data resulting from biomedical research projects. Human biomolecular data Human clinical and hea... Training
Figshare Figshare is a generalist, subject-agnostic repository for many different types of digital objects that can be used without cost to researchers. Data can be submitted to the central figshare repository (described here), or institutional repositories using the figshare software can be installed locally, e.g. by universities and publishers. Human biomolecular data Standards/Databases Training
Findata Findata is the data permit authority for the social and health care sector in Finland. Human clinical and hea...
Flye Flye is a de novo assembler for single-molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. Human biomolecular data Tool info Training
freebayes freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events smaller than the length of a short-read sequencing alignment. Human biomolecular data Pathogen characterisation Tool info Training
French Health Data Hub French Health Data Hub that guarantees easy and unified, transparent and secure access to health data to improve the quality of care and patient support. Human clinical and hea...
Freyja Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). Pathogen characterisation Tool info
g:Profiler g:GOSt performs functional enrichment analysis, also known as over-representation analysis (ORA) or gene set enrichment analysis, on input gene list. Pathogen characterisation Tool info Training
Galaxy Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses. Human biomolecular data Pathogen characterisation General guidelines Human clinical and hea... Using the ENA data sub... Tool info Training
Galaxy Europe The European Galaxy server. Provides access to thousands of tools for scalable and reproducible analysis. Pathogen characterisation An automated SARS-CoV-... Training
Galaxy University of Tartu The University of Tartu Galaxy instance. Enables local university users to run their analyses in the Galaxy environment. Was heavily used during the KoroGenoEST sequencing studies. SARS-CoV-2 sequencing ...
GenBank GenBank is the NIH genetic sequence database of annotated collections of all publicly available DNA sequences. Human biomolecular data Tool info Standards/Databases Training
GeneMANIA GeneMANIA helps you predict the function of your favourite genes and gene sets. Human biomolecular data Tool info Training
Genome Analysis Toolkit (GATK) GATK is a widely used tool for variant calling and genotyping from NGS data. Human biomolecular data Tool info
Genomic Data Infrastructure (GDI) The Genomic Data Infrastructure (GDI) project is enabling access to genomic and related phenotypic and clinical data across Europe. It is doing this by establishing a federated, sustainable and secure infrastructure to access the data. It builds on the outputs of the Beyond 1 Million Genomes (B1MG) project and is realising the ambition of the 1+Million Genomes (1+MG) initiative.
GEO The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. Accepts next generation sequence data that examine quantitative gene expression, gene regulation, epigenomics or other aspects of functional genomics using methods such as RNA-seq, miRNA-seq, ChIP-seq, RIP-seq, HiC-seq, methyl-seq, etc. GEO will process all components of your study, including the samples, project description, processed data files, and will submit the raw data files to the Sequence Read Archive (SRA) on the researchers behalf. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the studies and gene expression patterns stored in GEO. Human biomolecular data Standards/Databases Training
ggplot2 ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. Human biomolecular data Tool info Training
GitHub GitHub is a versioning system, used for sharing code, as well as for sharing of small data. Human biomolecular data Pathogen characterisation An automated pipeline ... Standards/Databases Standards/Databases Training
GitLab GitLab is an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Self-host GitLab on your own servers, in a container, or on a cloud provider. Human biomolecular data Pathogen characterisation Standards/Databases Training
Global Alliance for Genomics and Health (GA4GH) The metadata model for GA4GH, an international coalition of both public and private interested parties, formed to enable the sharing of genomic and clinical data. Human biomolecular data Tool info Standards/Databases Training
Global Initiative on Sharing All Influenza Data (GISAID) A web-based platform for sharing viral sequence data, initially for influenza data, and now for other pathogens (including SARS-CoV-2). Pathogen characterisation Human clinical and hea... Pathogen characterisation Standards/Databases
GO GO is to perform enrichment analysis on gene sets. Human biomolecular data Pathogen characterisation Tool info Training
GRAF pop GRAF pop is a software tool that infers the subject ancestry. Human biomolecular data
GRAF sex Tool that determines subject sexes using the genotypes. Human biomolecular data
GRIDSS GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements. Human biomolecular data Tool info
GSEA Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states Human biomolecular data Tool info Training
GTEx The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 53 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images. Human biomolecular data Tool info Standards/Databases Training
Health Research Data UK HDR UK is a national institute with the aim to unite the UK’s health and care data to enable discoveries that improve people’s lives. We do this by uniting, improving and using health and care data as one national institute. Human clinical and hea...
HISAT2 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) to a population of human genomes (as well as to a single reference genome). Human biomolecular data Tool info Training
IGV The Integrative Genomics Viewer (IGV) is a high-performance, easy-to-use, interactive tool for the visual exploration of genomic data. Human biomolecular data Tool info Training
IntAct IntAct (Molecular Interaction Database) Website Human biomolecular data Pathogen characterisation Tool info Standards/Databases Training
IQtree IQ-TREE is designed to efficiently handle large phylogenomic datasets, utilize multicore and distributed parallel computing for faster analysis, and automatically resume interrupted analyses through checkpointing. Pathogen characterisation Tool info
ISARIC COVID-19 Case Report Form The ISARIC-WHO Case Report Forms (CRFs) should be used to collect data on individuals presenting with suspected or confirmed COVID-19, with the aim to standardise clinical data to improve patient care and inform the public health response. Linked pathogen and ho... Standards/Databases
Kallisto Kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. Pathogen characterisation Tool info
KEGG A set of annotation maps for Kyoto encyclopedia of genes and genomes (KEGG) Human biomolecular data Tool info Training
Kraken 2 A taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. Pathogen characterisation
LimeSurvey LimeSurvey is a free and open source advanced online survey system to create online surveys.
Lineagespot Lineagespot is a framework written in R, and aims to identify SARS-CoV-2 related mutations based on a single (or a list) of variant(s) file(s). Pathogen characterisation Tool info
Lumpy A probabilistic framework for structural variant discovery. Human biomolecular data Tool info
MACS Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. Human biomolecular data Tool info Training
MAFFT MAFFT is a multiple sequence alignment program Human biomolecular data Pathogen characterisation Tool info
Manta Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. Human biomolecular data Tool info
matplotlib Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Human biomolecular data Socioeconomic data Tool info Training
MAXQUANT MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. Pathogen characterisation Tool info Training
MEGAHIT MEGAHIT is an ultra-fast and memory-efficient NGS assembler optimized for metagenomes. Pathogen characterisation Tool info
MetaboAnalyst MetaboAnalyst is a comprehensive platform dedicated for metabolomics data analysis via user-friendly, web-based interface. Human biomolecular data Pathogen characterisation Tool info Training
Metagen-FastQC Cleans metagenomic reads to remove adapters, low-quality bases and host (e.g. human) contamination. Using the ENA data sub... SARS-CoV-2 sequencing ...
MethylKit methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. Human biomolecular data Tool info
methylPipe Base resolution DNA methylation data analysis Human biomolecular data Tool info
MetSign A computational platform for high-resolution mass spectrometry-based metabolomics Human biomolecular data
MIABIS MIABIS represents the minimum information required to initiate collaborations between biobanks and to enable the exchange of biological samples and data. The aim is to facilitate the reuse of bio-resources and associated data by harmonizing biobanking and biomedical research. Human biomolecular data Standards/Databases
Mouse Brain Alignment Tool (MBAT) Semi-automated processing of autoradiography (ARG) images from mouse brain tissue. This project includes a Napari-based user interface where ARG slides can be preprocessed and registered to Allen Brain Atlas regions. An automated pipeline ...
MrBayes MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters. Pathogen characterisation Tool info
MultiQC MultiQC searches a given directory for analysis logs and compiles a HTML report. Pathogen characterisation Tool info Training
MUSCLE MUSCLE is widely-used software for making multiple alignments of biological sequences. Human biomolecular data Pathogen characterisation Tool info Training
Mzmine MZmine 3 is an open-source software for mass-spectrometry data processing, with the main focus on LC-MS data. Human biomolecular data Pathogen characterisation Tool info
National Center for Biotechnology Information (NCBI) The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. Human biomolecular data Training
Nextflow Nextflow is a framework for data analysis workflow execution Human biomolecular data Pathogen characterisation Tool info Training
Nextstrain Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. Pathogen characterisation Tool info Training
Nextstrain Auspice Estonian local instance of the Nextstrain Auspice application that serves SARS-CoV-2 phylogenetic data SARS-CoV-2 sequencing ...
noWorkFlow The noWorkflow project aims at allowing scientists to benefit from provenance data analysis even when they don't use a workflow system. General guidelines
NumPy Python library for scientific computing. Socioeconomic data Socioeconomic data Tool info Training
NUTS The NUTS (Nomenclature of territorial units for statistics) classification, developed by eurostat, is a hierarchical system for dividing up the economic territory of the EU and the UK. Training
ODDISEI Secure ANalysis Environment (SANE) SANE is a virtual container in which the researcher can analyse sensitive data, while the data owner retains full control. Socioeconomic data
Omicsgenerator Omics Integrator is a package designed to integrate proteomic data, gene expression data and/or epigenetic data using a protein-protein interaction network. Human biomolecular data
OMSSA OMSSA (Open Mass Spectrometry Search Algorithm) is a tool to identify peptides in tandem mass spectrometry (MS/MS) data. The OMSSA algorithm uses a classic probability score to compute specificity. See also The NCBI C++ Toolkit and The NCBI C++ Toolkit Book. Pathogen characterisation Tool info
Ontology Lookup Service EMBL-EBI's web portal for finding ontologies Human clinical and hea... Tool info Standards/Databases Training
OpenBEL The OpenBEL Framework is an open-platform technology for managing, publishing, and using biological knowledge represented using the Biological Expression Language (BEL). Knowledge Graph Genera...
OpenMS OpenMS is an open-source software C++ library for LC-MS data management and analyses. Human biomolecular data Pathogen characterisation Tool info Training
OpenProvenance Set of user-friendly web applications for storing, validating, and translating W3C PROV-based provenance representations. General guidelines
OpenRefine Powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. Socioeconomic data Training
pandas Open source data analysis and manipulation tool, built on top of the Python programming language. Socioeconomic data Socioeconomic data Tool info Training
Panther The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence. Human biomolecular data Tool info Standards/Databases Training
Pathogens Portal The Pathogens Portal, launched in July 2023, is an invaluable resource for researchers, clinicians, and policymakers who need access to the latest and most comprehensive datasets on pathogens. The portal is a collaborative effort between the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) and partners. Pathogen characterisation Linked pathogen and ho... The Swedish Pathogens ... Standards/Databases Training
Pathogens Portal Cohort Browser The Pathogens Portal Cohort Browser presents discovery metadata of infectious disease cohort datasets and provides links to the associated datasets within ELIXIR Core Data Resources; search and filtering functionalities enable users to identify cohort studies of interest in a convenient manner. Linked pathogen and ho...
Pathogenwatch Pathogenwatch provides species and taxonomy prediction for over 60,000 variants of bacteria, viruses, and fungi. Pathogen characterisation
PepArMl A Meta-Search Peptide Identification Platform for Tandem Mass Spectra Pathogen characterisation Tool info
PHES-ODM A data model to improve wastewater surveillance through interoperable data. Pathogen characterisation
PhyML PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework. Human biomolecular data Tool info
Picard Picard is a suite of tools that provides quality control and processing of NGS data, including duplicate read removal, format conversion, and alignment. Human biomolecular data Tool info
PiGx SARS-CoV-2 Wastewater Sequencing Pipeline PiGx SARS-CoV-2 is a pipeline for analysing data from sequenced wastewater samples and identifying given lineages of SARS-CoV-2. Pathogen characterisation
Population Health Information Research Infrastructure (PHIRI) PHIRI is the roll-out of the research infrastructure on population health information that aims to facilitate and generate the best available evidence for research on health and well-being of populations as impacted by COVID-19. Human clinical and hea...
Prov python Python implementation of the PROV data model. General guidelines Training
Provenance storage A prototype of a provenance management service implementing the CPM (ISO 23494-2). General guidelines
provR Collect meta-data from scripts written in the R programming language. General guidelines
ProvToolbox Java implementation of the PROV data model. General guidelines
PyBEL Pure Python package for parsing and handling biological networks encoded in the Biological Expression Language (BEL). Knowledge Graph Genera... Tool info
QIIME 2 QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. Pathogen characterisation Tool info Training
Qualimap Qualimap is a quality control tool that assesses the quality of the sequencing data at different stages of the analysis pipeline, including read mapping, coverage, and expression analysis. Human biomolecular data Tool info
Quarto Quarto is an open-source scientific and technical publishing system that enables the creation of dynamic and reproducible content. Prototyping federated ... Training
R Free software environment for statistical computing and graphics. Socioeconomic data Tool info Training
R Markdown R Markdown can help to turn your analyses into high quality documents, reports, presentations and dashboards. Training
R Shiny Shiny is an R package that makes it easy to build interactive web apps straight from R. Training
RAxML A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies Pathogen characterisation Tool info
RDataTracker An R library to collect provenance from R scripts. General guidelines
ReAdW Convert ThermoFinningan RAW mass spectrometry files to the mzXML format. Pathogen characterisation Tool info
recordr Provenance tracking for R General guidelines
Research Data Centre at the BfArM Research at the BfArM concentrates on important and contemporary research focal points with regard to the marketing authorisation of medicinal products and improving the safety thereof as well as concerning the recording and assessment of risks in connection with medical devices. Human clinical and hea...
Research Object Crate (RO-Crate) RO-Crate is a lightweight approach to packaging research data with their metadata, using schema.org. An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations. General guidelines Standards/Databases
ResFinder ResFinder identifies acquired genes and/or finds chromosomal mutations mediating antimicrobial resistance in total or partial DNA sequence of bacteria. Pathogen characterisation Tool info
SAMtools SAMtools is a suite of programs for interacting with high-throughput sequencing data. Pathogen characterisation Human biomolecular data Pathogen characterisation Tool info Training
Sapporo WES Implementation of Workflow Execution Service (WES) or so-called Workflow-as-a-Service. General guidelines Human clinical and hea... Tool info
SARS-CoV-2 Contextual Data Specification A SARS-CoV-2 Contextual Data Specification from PHA4GE. Pathogen characterisation
SARS-CoV-2 Data Hubs Using technology that builds upon existing EMBL-EBI infrastructure, we provide SARS-CoV-2 Data Hubs to those public health agencies and other scientific groups responsible for generating viral sequence data from the outbreak at national or regional levels. Human biomolecular data
SARS-COV-2 outbreak in Andalucia SARS-CoV-2 whole genome sequencing circuit of Andalusia Human biomolecular data
Schema.org Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Human clinical and hea... General guidelines Standards/Databases Training
Scikit-learn Machine learning tools in Python Socioeconomic data Socioeconomic data Tool info Training
SDMX Registry SDMX Fusion Metadata Registry Socioeconomic data
SDMX-Reference Infrastructure (SDMX-RI) The SDMX-Reference Infrastructure (SDMX-RI) is a set of pick-and-choose building blocks and tools that allow data to be exposed to the external world through access rights by using web services. Socioeconomic data
seaborn Python data visualization library that provides a high-level interface for drawing attractive and informative statistical graphics. Socioeconomic data Training
SICER2 Redesigned and improved ChIP-seq broad peak calling tool SICER Human biomolecular data
Singularity Singularity is a widely-adopted container runtime that implements a unique security model to mitigate privilege escalation risks and provides a platform to capture a complete application environment into a single file (SIF) Human biomolecular data Training
Snakemake Snakemake is a framework for data analysis workflow execution Human biomolecular data Pathogen characterisation General guidelines Human clinical and hea... Tool info Training
SNippy Rapid haploid variant calling and core genome alignment. Pathogen characterisation Tool info
SnpEff Genetic variant annotation and functional effect prediction toolbox. It annotates and predicts the effects of genetic variants on genes and proteins. Human biomolecular data Pathogen characterisation Tool info Training
SPAdes SPAdes is an assembly toolkit containing various assembly pipelines. Human biomolecular data Pathogen characterisation Tool info Training
SRA The SRA is NIH's primary archive of high-throughput sequencing data and is part of the International Nucleotide Sequence Database Collaboration (INSDC) that includes at the NCBI Sequence Read Archive (SRA), the European Bioinformatics Institute (EBI), and the DNA Database of Japan (DDBJ). Data submitted to any of the three organizations are shared among them. SRA accepts data from all kinds of sequencing projects including clinically important studies that involve human subjects or their metagenomes, which may contain human sequences. These data often have a controlled access via dbGaP (the database of Genotypes and Phenotypes). Human biomolecular data Tool info Standards/Databases Training
Stanford HIV Drug Resistance Database (HIVDB) A curated database containing nearly all published HIV RT and protease sequences: a resource designed for researchers studying evolutionary and drug-related variation in the molecular targets of anti-HIV therapy. Pathogen characterisation
STAR Spliced Transcripts Alignment to a Reference Human biomolecular data Tool info Training
StreamFlow Container-native workflow manager for hybrid infrastructures General guidelines Human clinical and hea...
Swedish Pathogens Portal The Swedish Pathogens Portal was previously known as the Swedish COVID-19 Data Portal. It is the Swedish national node of the Pathogens Portal, aimed at facilitating the sharing of data related to pathogens and pandemic preparedness. Pathogen characterisation The Swedish Pathogens ... Standards/Databases
TCGA The Cancer Genome Atlas (TCGA) is a comprehensive, collaborative effort led by the National Institutes of Health (NIH) to map the genomic changes associated with specific types of tumors to improve the prevention, diagnosis and treatment of cancer. Its mission is to accelerate the understanding of the molecular basis of cancer through the application of genome analysis and characterization technologies. Human biomolecular data Standards/Databases Training
The Data Use Ontology (DUO) The Data Use Ontology (DUO) describes data use requirements and limitations. DUO allows to semantically tag datasets with restriction about their usage, making them discoverable automatically based on the authorization level of users, or intended usage. This resource is based on the OBO Foundry principles, and developed using the W3C Web Ontology Language. It is used in production by the European Genome-phenome Archive (EGA) at EMBL-EBI and CRG as well as the Broad Institute for the Data Use Oversight System (DUOS). Human biomolecular data Standards/Databases
The European Health Information Gateway The European Health Information Gateway is a platform that provides access to various health information resources and datasets from across Europe, including data on health systems, health determinants, and health outcomes. Human clinical and hea...
The National health service metadata catalogue Tool to find health data in the metadata catalogue in Portugal. Human clinical and hea...
The Open Biological and Biomedical Ontology (OBO) Foundry Collaborative effort to develob interoperable ontologies for the biological sciences Human clinical and hea... Standards/Databases
The Public Service Data Catalogue Tool to discover the Data held by the Irish Public Service Human clinical and hea...
tidyr Tidy data describes a standard way of storing data that is used wherever possible throughout the [tidyverse](https://www.tidyverse.org/). Socioeconomic data
toil-cwl-runner The toil-cwl-runner command provides cwl-parsing functionality using cwltool, and leverages the job-scheduling and batch system support of Toil. Human biomolecular data
Trifacta Trifacta is designed for analysts to explore, transform, and enrich raw data into clean and structured formats. Socioeconomic data
Trimmomatic Trimmomatic is a tool used for the removal of adapter sequences, low-quality reads, and sequences with ambiguous bases from NGS data. Pathogen characterisation Human biomolecular data Tool info Training
UCSC Genome Browser An online tool for analyzing and visualizing genomic data. It allows users to add and share annotations. Human biomolecular data An automated SARS-CoV-... Tool info Standards/Databases
UNottingham Beacon Beacon from UNottingham to query a backend OMOP database of synthetic COVID-19 patient EHRs (electronic health records). Human biomolecular data
VarScan Variant calling and somatic mutation/CNV detection for next-generation sequencing data Human biomolecular data Pathogen characterisation Tool info
VCFtools VCFtools is a program package designed for working with VCF files. Pathogen characterisation Tool info
Velvet Velvet is an algorithm package that has been designed to deal with de novo genome assembly and short read sequencing alignments. Pathogen characterisation Tool info Training
VEP VEP (Variant Effect Predictor) predicts the functional effects of genomic variants. Human biomolecular data Pathogen characterisation Tool info Training
Viral AI A global network for genomic surveillance and infectious disease research Human biomolecular data
VLQ A pipeline for lineage abundance estimation from wastewater sequencing data. Pathogen characterisation
Webin-CLI Command line application to submit assemblies and transcriptomes to ENA. Using the ENA data sub... Training
WfExS Workflow Execution Service Backend (WfExS-backend) is a high-level orchestrator to run scientific workflows reproducibly. General guidelines
WorkflowHub A registry for describing, sharing and publishing scientific computational workflows. Pathogen characterisation An automated SARS-CoV-... Tool info Standards/Databases Training
wtdbg2 Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT). Human biomolecular data Tool info
X! Tandem X! Tandem open source is software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification. Pathogen characterisation
xcms Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Pathogen characterisation Tool info Training
XCMS Online A systems biology tool for analyzing metabolomic data. It automatically superimposes raw metabolomic data onto metabolic pathways and integrates it with transcriptomic and proteomic data. Human biomolecular data Tool info
Zenodo Zenodo is a generalist research data repository built and developed by OpenAIRE and CERN. Human biomolecular data Standards/Databases Training
Epidemiology of Infectious Diseases (Epistat)

A web based application for visualising and exploring data on infectious diseases monitored by Sciensano.

COVID Epistat

Dashboard for COVID-19 epidemiological data (vaccination, laboratory testing, wastewater, variants, hospitalised patients, mortality, nursing home patients, mental health indicators, seroprevalence).

Sciensano R Shiny Apps

Shiny is an R package that makes it easy to build interactive web apps straight from R. At Sciensano, several Shiny Apps have been developed to process, analyse and visualise data during the COVID-19 crisis. These include the Surge App (monitoring and data quality of COVID-19 hospitalisations), Indicator App (COVID-19 indicators based on test positivity rates per province), Coverage App (coverage of clinical database on hospitalised patients), Hospital Indicators (forecasting and profile of hospitalised patients), and Quality of reporting (quality indicators of reporting by individual hospitals).

R Shiny
R Markdown

Tool at Sciensano for generating weekly and daily reports.

Sciensano LimeSurvey

Tool for online surveys.

LimeSurvey
DMPonline.be

Tool that provides templates for data management plans.

Figures on notifiable infectious diseases

Dashboard for notifiable infectious diseases in Flanders.

Galaxy Belgium

Galaxy Belgium is a Galaxy instance managed by the Belgian ELIXIR node, funded by the Flemish government, which utilises infrastructure provided by the Flemish Supercomputer Center (VSC).

Galaxy
COVID-NL clinical data dashboard

The Dutch national COVID-19 clinical data dashboard allows exploration and reuse of clinical data from Dutch university medical centers (UMCs). The dashboard provides researchers with a clear overview of what is available, allows searching for specific data and makes access to such data easier when the necessary ethical and legal conditions have been met. The policy for access to and sharing of clinical COVID-19 is described in the HRI COVID policy document.

Standards/Databases
COVID-NL metadata portal

The Dutch national COVID-19 metadata portal describes the content of the collections and type of data. The underlying data remains at the source, but where possible a link to the data or the data request procedure are provided on the portal. The first health care data sets in the portal are coming from observational studies funded by ZonMw, NFU COVID-19 clinical research data, collaborating top clinical hospitals (STZ), as well as other regional hospitals. However, the portal is open to any health care provider wishing to make their COVID-19 data available for research.

Standards/Databases
ODISSEI

The Open Data Infrastructure for Social Science and Economic Innovations (ODISSEI) is the national research infrastructure for the social sciences in the Netherlands.

Standards/Databases
DANS Data Station Life Sciences

The Data Archiving and Networked Services (DANS) is the Dutch national centre of expertise and repository for research data. This data station allows you to deposit and search for data within the fields of medical, health and green life sciences.

Standards/Databases
DANS Data Station Life Sciences

The Data Archiving and Networked Services (DANS) is the Dutch national centre of expertise and repository for research data. This data station allows you to deposit and search for data within the social sciences and humanities.

ELSI servicedesk

The ELSI Servicedesk provides guidance and answers to the ethical, legal and social implications of research on personalised medicine and next generation sequencing that life science professionals, policymakers and patients are faced with.

SARS-CoV-2 Database

Norwegian SARS-CoV-2 database

Standards/Databases
Folkehelseinstituttet (FHI)

Norwegian Institute of Public Health (NIPH) portal for infectious disease information

FEGA Norway

Federated European Genome-phenome Archive (EGA) node

European Genome-phenome Archive (EGA)
Swedish Pathogens Portal

The Swedish Pathogens Portal is a hub for data, tools, services, and other resources centred around pathogens, such as SARS-CoV-2, and pandemic preparedness in Sweden.

Swiss Pathogen Surveillance Platform (SPSP)

SPSP is a secure One-health online platform that enables near real-time sharing under controlled access of pathogen genomic data and their associated clinical/epidemiological metadata. During COVID-19, it served as the Swiss SARS-CoV-2 genomic data hub, collecting data, annotating it, communicating reports to the federal public health authorities and openly re-sharing anonymised data on the Covid-19 Data Platform.

Pathogen characterisation Pathogen characterisation Pathogen characterisation
Research Data Management (RDM) sources in Switzerland

RDMkit page on Switzerland’s RDM guidelines and resources.

COVID-19 Data Portal aggregation of Swiss COVID-19 data

COVID-19 Data Portal aggregation of Swiss COVID-19 data

V-pipe

V-pipe is the bioinformatics pipeline that integrates various open-source software packages for assessing viral genetic diversity from next-generation sequencing (NGS) data derived from intra-host virus populations

Pathogen characterisation
ViralZone

ViralZone is a SIB Swiss Institute of Bioinformatics web-resource for all viral genus and families, providing general molecular and epidemiological information, along with virion and genome figures. Each virus or family page gives an easy access to UniProtKB/Swiss-Prot viral protein entries

Pathogen characterisation
Nextstrain

Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. It provides a continually-updated view of publicly available data alongside powerful analytic and visualization tools for use by the community. The goal is to aid epidemiological understanding and improve outbreak response

Pathogen characterisation
CoV Spectrum

CoV-Spectrum is an interactive tool to analyze and discover variants of SARS-CoV-2. Main features include a powerful search engine that supports amino acid and nucleotide mutation filtering, the comparison of multiple variants, and a built-in fitness advantage estimation model.

Pathogen characterisation
Covariants

CoVariants provides an overview of SARS-CoV-2 variants and mutations that are of interest. It displays what mutations define a variant, what impact they might have (with links to papers and resources), where variants are found, and link the variants in Nextstrain.

Pathogen characterisation
COVTriage

COVTriage is a search engine developed as part of SIBiLS (Swiss Institute of Bioinformatics Literature Services), which purpose is to rank the COVID-19 literature (Medline, PMC, Cord-19) according to the 9 axes of the COVoc ontology (controlled vocabulary to support literature triage for COVID-19). This resource supports COVID-19 / SARS-CoV-2 research.

Pathogen characterisation
Computational Linguistics for COVID-19

To process COVID-19-related scientific publications automatically to detect mentions of domain-specific entities of particular relevance (such as genes, symptoms, drugs, organs, etc.). To enhance accessibility to the literature, for example, simplifying the search of papers dealing with a particular gene or identifying unexpected connections between different entities.

Pathogen characterisation