Tools and resources list

This page captures all tools and resources mentioned across the Infectious Diseases Toolkit.

Tool or resource	Description	Related pages	Registry
1+ Million Genomes (1+MG)	The 1+ Million Genomes (1+MG) initiative aims to enable secure access to genomics and the corresponding clinical data across Europe for better research, personalised healthcare and health policy making. Since the Digital Day 2018, 25 EU countries, the UK and Norway signed Member States declaration on stepping up efforts towards creating a European data infrastructure for genomic data and implementing common national rules enabling federated data access. The initiative forms part of the EU's agenda for the Digital Transformation of Health and Care and is aligned with the goals of the European Health Data Space.		Training
ABRicate	ABRicate is used for mass screening of contigs for antimicrobial resistance or virulence genes.	Pathogen characterisation	Tool info Training
ACE Cohort	Asymptomatic COVID-19 in Education (ACE) Cohort	Human biomolecular data	Training
ADA-M	Responsible sharing of biomedical data and biospecimens via the Automatable Discovery and Access Matrix (ADA-M). The Automatable Discovery and Access Matrix (ADA-M) provides a standardized way to unambiguously represent the conditions related to data discovery and access. By adopting ADA-M, data custodians can generally describe what their data are (the Header section), who can access them (the Permissions section), terms related to their use (the Terms section), and special conditions (the Meta-Conditions). By doing so, data custodians can participate in data sharing and collaboration by making meta information about their data computer-readable and hence directly available for digital communication, searching and automation activities.	Human biomolecular data	Tool info
AgrVATE	AgrVATE is a tool for rapid identification of Staphylococcus aureus agr locus type and also reports possible variants in the agr operon.	Pathogen characterisation	Tool info
AMRfinderplus	NCBI Antimicrobial Resistance Gene Finder (AMRFinderPlus)	Pathogen characterisation	Tool info
ANNOVAR	ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes.	Human biomolecular data Pathogen characterisation	Tool info
apex	Absolute protein expression Quantitative Proteomics Tool, is a free and open source Java implementation of the APEX technique for the quantitation of proteins based on standard LC- MS/MS proteomics data.	Pathogen characterisation	Tool info
ARIBA	ARIBA is an Antimicrobial Resistance Identification By Assembly	Pathogen characterisation
ArrayExpress	ArrayExpress is a database of functional genomics experiments that can be queried and the data downloaded. It includes gene expression data from microarray and high throughput sequencing studies. Data is collected to MIAME and MINSEQE standards. Experiments are submitted directly to ArrayExpress or are imported from the NCBI GEO database.	Human biomolecular data Linked pathogen and ho...	Tool info Standards/Databases Training
artic	artic is a pipeline and set of accompanying tools for working with viral nanopore sequencing data, generated from tiling amplicon schemes.	Pathogen characterisation	Tool info Standards/Databases Training
Arvados	With Arvados, bioinformaticians run and scale compute-intensive workflows, developers create biomedical applications, and IT administrators manage large compute and storage resources.	Human biomolecular data
Bakta	Bakta is a tool for the rapid & standardized annotation of bacterial genomes and plasmids from both isolates and MAGs.	Pathogen characterisation	Tool info Training
Bcftools	Bcftools is a set of tools for working with variant calls in the VCF format.	Pathogen characterisation Human biomolecular data	Tool info Training
Beacon v2	Beacon v2 is a protocol/specification established by the Global Alliance for Genomics and Health initiative (GA4GH) that defines an open standard for federated discovery of genomic data and associated information in biomedical research and clinical applications.	Human biomolecular data Human clinical and hea...	Tool info Standards/Databases Training
BEAST	BEAST is a cross-platform program for Bayesian phylogenetic analysis, estimating rooted, time-measured phylogenies using strict or relaxed molecular clock models. It uses Markov chain Monte Carlo (MCMC) to average over tree space and includes a graphical user interface for setting up analyses and tools for result analysis.	Pathogen characterisation	Tool info
BEAUti	BEAUti is a graphical user-interface (GUI) application for generating BEAST XML files.	Pathogen characterisation
Bento platform	The Bento platform enables the research community to explore the BQC19 cohort aggregate data.	Human biomolecular data
Beyond 1 Million Genomes (B1MG)	The Beyond 1 Million Genomes (B1MG) project is helping to create a network of genetic and clinical data across Europe. The project provides coordination and support to the 1+ Million Genomes Initiative (1+MG). This initiative is a commitment of 24 EU countries, the UK and Norway to give cross-border access to one million sequenced genomes by 2022.
BioGRID	BioGRID is a comprehensive biomedical repository for curated protein, genetic and chemical interactions	Human biomolecular data Pathogen characterisation	Tool info Standards/Databases
BioPortal	A comprehensive repository of biomedical ontologies	Human clinical and hea...	Tool info Standards/Databases Training
BioSamples	BioSamples stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry. Samples are either 'reference' samples (e.g. from 1000 Genomes, HipSci, FAANG) or have been used in an assay database such as the European Nucleotide Archive (ENA) or ArrayExpress. It provides links to assays and specific samples, and accepts direct submissions of sample information.	Human biomolecular data Linked pathogen and ho...	Tool info Standards/Databases Training
BioStudies	The BioStudies database holds descriptions of biological studies, links to data from these studies in other databases at EMBL-EBI or outside, as well as data that do not fit in the structured archives at EMBL-EBI. The database can accept a wide range of types of studies described via a simple format. It also enables manuscript authors to submit supplementary information and link to it from the publication.	FAIR data Linked pathogen and ho...	Tool info Standards/Databases Training
Bismark	Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step.	Human biomolecular data	Tool info Training
Bitbucket	Git based code hosting and collaboration tool, built for teams.	Human biomolecular data Pathogen characterisation	Standards/Databases
Bowtie2	Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.	Human biomolecular data Pathogen characterisation	Tool info Training
BWA	BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.	Human biomolecular data Pathogen characterisation	Tool info Training
C-WAP	CFSAN Wastewater Analysis Pipeline to estimate the percentage of SARS-CoV-2 variants in a sample.	Pathogen characterisation
camflow	CamFlow is a Linux Security Module (LSM) designed to capture data provenance for the purpose of system audit	General guidelines
Canu	Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing.	Human biomolecular data Pathogen characterisation	Tool info
CellDesigner	CellDesigner is a structured diagram editor for drawing gene-regulatory and biochemical networks.	Pathogen characterisation
Centrifuge	Tool to classify the taxonomic origin of a read in pathogen sequencing data	Pathogen characterisation Human biomolecular data	Tool info
CESSDA Data Catalogue (CDC)	CDC is a one-stop shop for searching and finding European social science data.	Socioeconomic data	Tool info Standards/Databases
CESSDA Vocabulary Service	CESSDA Vocabulary Service enables users to discover, browse, and download controlled vocabularies in a variety of languages. The service is provided by the Consortium of European Social Science Data Archives (CESSDA). The majority of the source (English) vocabularies included in the service have been created by the DDI Alliance. The Data Documentation Initiative (DDI) is an international standard for describing data produced by surveys and other observational methods in the social, behavioural, economic, and health sciences.		Standards/Databases
Chenomx	A commercial software package for NMR spectral processing that offers a semi-automated tool for spectral deconvolution, enabling interactive fitting of metabolite peaks to reference spectra and quantifying their concentrations.	Pathogen characterisation
chewBBACA	chewBBACA is a software suite for the creation and evaluation of core genome and whole genome MultiLocus Sequence Typing (cg/wgMLST) schemas and results.	Pathogen characterisation	Tool info
Clark	Clark is a fast, versatile and accurate tool for sequence classification system	Pathogen characterisation	Tool info Training
ClustalW	ClustalW is a progressive multiple sequence alignment tool to align a set of sequences by repeatedly aligning pairs of sequences and previously generated alignments.	Human biomolecular data Pathogen characterisation	Tool info Training
COJAC	The cojac package comprises a set of command-line tools to analyse co-occurrence of mutations on amplicons.	Pathogen characterisation	Tool info
CoMet	A workflow using contig coverage and composition for binning a metagenomic sample with high precision	Pathogen characterisation	Tool info
Common Workflow Language (CWL)	An open standard for describing workflows that are build from command line tools	General guidelines Human clinical and hea...	Standards/Databases Training
COMPSs	COMP Superscalar (COMPSs) is a task-based programming model which aims to ease the development of applications for distributed infrastructures, such as large High-Performance clusters (HPC), clouds and container managed clusters.	General guidelines Human clinical and hea...	Tool info
COVID-19 BEACON	The COVID-19 Beacon is a searchable platform for SARS-CoV-2 genomic variants conforming to the Beacon specifications of the Global Alliance for Genomics and Health but adjusted for viral genome searches.	Human biomolecular data
COVID-19 Data Portal	The COVID-19 Data Portal enables researchers to upload, access and analyse COVID-19 related reference data and specialist datasets. The aim of the COVID-19 Data Portal is to facilitate data sharing and analysis, and to accelerate coronavirus research. The portal includes relevant datasets submitted to EMBL-EBI as well as other major centres for biomedical data. The COVID-19 Data Portal is the primary entry point into the functions of a wider project, the European COVID-19 Data Platform.	FAIR data Human biomolecular data Human clinical and hea... Socioeconomic data The Swedish Pathogens ...	Tool info Standards/Databases Training
COVID19 Disease Map	The COVID-19 Disease Map is an assembly of molecular interaction diagrams, established based on literature evidence.	Pathogen characterisation
COWWID	A GitHub repository from the CBG-ETHZ group offering tools for detecting SARS-CoV-2 variants in Switzerland.
CRG COVID-19 Viral Beacon	A platform allowing for browsing SARS-CoV-2 variability at the genome, amino acid, structural, and motif levels	Human biomolecular data An automated SARS-CoV-...
Cromwell	Cromwell is a Workflow Management System geared towards scientific workflows.	Human biomolecular data
CS_Score	R package for cell-type-specific co-expression inference from single cell RNA-sequencing data.	Human biomolecular data
Cutadapt	Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.	Pathogen characterisation	Tool info Training
cwltool	Reference implementation to provide comprehensive validation of CWL files as well as provide other tools related to working with CWL.	Human biomolecular data
Cytoscape	Cytoscape provides a solid platform for network visualization and analysis	Human biomolecular data Pathogen characterisation	Tool info Training
DAGitty	DAGitty is a browser-based environment for creating, editing, and analyzing causal diagrams (also known as directed acyclic graphs or causal Bayesian networks).	Prototyping federated ...	Tool info Standards/Databases
Danish Research Health Data Gateway	Tool that provides the user with an overview of available health data in Denmark and the entire application process, from initial idea to final application.	Human clinical and hea...
Data Structure Wizard (DSW)	The Data Structure Wizard (DSW) is a Java standalone desktop application that supports version 2.0 & 2.1 of the SDMX standard.	Socioeconomic data
data.validator R package	Validate dataset by columns and rows using convenient predicates inspired by 'assertr' package.	Socioeconomic data
DAVID	The Database for Annotation, Visualization and Integrated Discovery (DAVID) provides a comprehensive set of functional annotation tools for investigators to understand the biological meaning behind large lists of genes.	Human biomolecular data	Tool info Training
dbGaP	The Database of Genotypes and Phenotypes (dbGaP) archives and distributes the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.	FAIR data Human biomolecular data	Tool info Standards/Databases Training
dbNSFP	A comprehensive database of transcript-specific functional predictions and annotations for human non-synonymous and splice-site SNVs	Human biomolecular data Pathogen characterisation	Tool info
DCAT	An RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs.	Human clinical and hea... Human biomolecular data	Standards/Databases
DDI Tools	Searchable list of tools available to help you work with DDI, from authoring and editing metadata to data transformations. The tools have been developed independently by a variety of organizations from the global DDI community.
DeconSeq	Tool to remove human reads from pathogen sequencing data.	Human biomolecular data	Tool info
dedupe	Python library that uses machine learning to perform fuzzy matching, deduplication, and entity resolution quickly on structured data.	Socioeconomic data	Training
DeepVariant	DeepVariant is a deep learning-based variant caller that takes aligned reads (in BAM or CRAM format), produces pileup image tensors from them, classifies each tensor using a convolutional neural network, and finally reports the results in a standard VCF or gVCF file.	Human biomolecular data	Tool info
Delly	Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read and long-read massively parallel sequencing data.	Human biomolecular data	Tool info
DESeq2	Differential gene expression analysis based on the negative binomial distribution	Human biomolecular data Pathogen characterisation	Tool info Training
DFAST	DFAST is a flexible and customizable pipeline for prokaryotic genome annotation as well as data submission to the INSDC	Pathogen characterisation	Tool info
DIAMOND	DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.	Pathogen characterisation	Tool info Training
dlookr	A collection of tools that support data diagnosis, exploration, and transformation.	Socioeconomic data
Docker	Docker is a software for the execution of applications in virtualized environments called containers. It is linked to DockerHub, a library for sharing container images	Human biomolecular data	Standards/Databases Standards/Databases Training
Dorado	Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.	Pathogen characterisation	Tool info
dplyr	dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges.	Socioeconomic data Socioeconomic data	Training
Dragen-GATK	DRAGEN-GATK Best Practices contains open-source workflows that are compatible between Illumina's platforms and mainstream infrastructure.	Human biomolecular data Pathogen characterisation
Dragonflye	Dragonflye is a pipeline that aims to make assembling Oxford Nanopore reads quick and easy.	Pathogen characterisation
Dryad	Dryad is an open-source, community-led data curation, publishing, and preservation platform for CC0 publicly available research data.	Human biomolecular data	Standards/Databases
dupRadar	dupRadar is used for the assessment of duplication rates in RNA-Seq datasets.	Pathogen characterisation	Tool info
Dutch COVID-19 Data Portal	The dutch COVID-19 Data Portal provides researchers with a clear overview of what is available, allow searching for specific data and make access to such data easier when the necessary ethical and legal conditions have been met.	Human biomolecular data
EBI	The European Bioinformatics Institute is a bioinformatics research center that is part of the European Molecular Biology Laboratory and is located in Hinxton, England. The institution combines intense research activity with the development and maintenance of a set of bioinformatics lines, services and databases.	Human biomolecular data	Training
ECTyper	ECTyper is a standalone versatile serotyping module for Escherichia coli. It supports both fasta (assembled) and fastq (raw reads) file formats.	Pathogen characterisation	Tool info
EdgeR	Empirical Analysis of Digital Gene Expression Data in R	Human biomolecular data	Tool info Training
EGA Beacon	Interface to query on the EGA data through Beacon v2	Human biomolecular data
emmtyper	emmtyper is a command line tool for emm-typing of Streptococcus pyogenes using a de novo or complete assembly.	Pathogen characterisation	Tool info
ENA upload CLI	Command line tool (CLI) allowing easy submission of data and respective metadata to the European Nucleotide Archive (ENA) using tabular files or an excel spreadsheet. The tool allows programatic submission of all ENA objects (study, sample, run and experiment) without the need of logging in to the Webin interface. This also includes client side validation using ENA checklists and releasing the ENA objects.	Using the ENA data sub... SARS-CoV-2 sequencing ...
ENA upload Galaxy tool	Galaxy tool wrapper of the ENA upload CLI to submit experimental data and respective metadata to the European Nucleotide Archive (ENA).	Using the ENA data sub... SARS-CoV-2 sequencing ...
ENA Webin CLI	Galaxy wrapper to submit consensus sequences to ENA in an interactive way. The tool has the Webin-CLI script of ENA at its core and supports all sample checklists.	Using the ENA data sub...
Enrichr	Functional Enrichment Analysis and Network Construction	Pathogen characterisation	Tool info
Estonian Biobank	The Estonian Biobank has established a population-based biobank of Estonia with a current cohort size of more than 200,000 individuals (genotyped with genome-wide arrays), reflecting the age, sex and geographical distribution of the adult Estonian population. Considering the fact that about 20% of Estonia's adult population has joined the programme, it is indeed a database that is very important for the development of medical science both domestically and internationally.	Human biomolecular data
Estonian COVID-19 Data Portal	Estonian instance of the COVID-19 Data Portal. Among other information, served Estonian SARS-CoV-2 sequencing dashboards.	SARS-CoV-2 sequencing ... The Swedish Pathogens ...
EUI COVID-19 SSH Data Portal	The COVID-19 SSH Data Portal provides integrated search, discovery, and linking to datasets published on the web relevant for COVID-19-related research in the Social Sciences and Humanities.	Socioeconomic data	Standards/Databases
EuroHPC	EuroHPC Joint Undertaking is a joint initiative between the EU, European countries and private partners to develop a World Class Supercomputing Ecosystem in Europe.	Pathogen characterisation
European Centre for Disease Prevention and Control (ECDC)	It is an EU agency aimed at strengthening Europe's defences against infectious diseases. Their mission is to identify, assess and communicate current and emerging threats to human health posed by infectious diseases.	Human clinical and hea...
European Clinical Research Infrastructure Network (ECRIN) tools	ECRIN develops, contributes to, and maintains freely accessible tools that facilitate the identification of clinical trial objects, data sharing, access to regulatory and methodological designs and much more to support researchers looking to conduct multinational clinical research.	Human clinical and hea...
European Genome-phenome Archive (EGA)	The European Genome-phenome Archive (EGA) is a service for permanent archiving and sharing of personally identifiable genetic, phenotypic, and clinical data generated for the purposes of biomedical research projects or in the context of research-focused healthcare systems. Access to data must be approved by the specified Data Access Committee (DAC).	FAIR data Human biomolecular data Human clinical and hea... Linked pathogen and ho...	Tool info Standards/Databases Training
European Health Data Space (EHDS)	The European Health Data Space is a health specific ecosystem comprised of rules, common standards and practices, infrastructures and a governance framework that aims at empowering individuals through increased digital access to and control of their electronic personal health data, at national level and EU-wide.	Human clinical and hea... Human clinical and hea...
European Health Information Portal	The Health Information Portal provides access to population health and healthcare data across Europe.	FAIR data Human clinical and hea...	Standards/Databases
European Language Social Science Thesaurus (ELSST)	The European Language Social Science Thesaurus (ELSST) is a broad-based, multilingual thesaurus for the social sciences. It is owned and published by the Consortium of European Social Science Data Archives (CESSDA) and its national Service Providers.		Standards/Databases
European Medicines Agency (EMA)	The European Medicines Agency (EMA) is a decentralised agency of the European Union (EU). It is responsible for the scientific evaluation, supervision and safety monitoring of medicines.	Human clinical and hea...
European Nucleotide Archive (ENA)	Provides a record of the nucleotide sequencing information. It includes raw sequencing data, sequence assembly information and functional annotation.	Pathogen characterisation Human clinical and hea... Pathogen characterisation Human biomolecular data An automated SARS-CoV-... Using the ENA data sub... SARS-CoV-2 sequencing ... Linked pathogen and ho...	Tool info Standards/Databases Training
FAIRsharing	FAIRsharing is a FAIR-supporting resource that provides an informative and educational registry on data standards, databases, repositories and policy, alongside search and visualization tools and services that interoperate with other FAIR-enabling resources. FAIRsharing guides consumers to discover, select and use standards, databases, repositories and policy with confidence, and producers to make their resources more discoverable, more widely adopted and cited. Each record in fairsharing is curated in collaboration with the maintainers of the resource themselves, ensuring that the metadata in the fairsharing registry is accurate and timely.	Pathogen characterisation Human biomolecular data Ethical, Legal, and So...	Standards/Databases Training
fastp	A tool designed to provide ultrafast all-in-one preprocessing and quality control for FastQ data.	Pathogen characterisation	Tool info Training
FASTQC	A quality control tool for high throughput sequence data.	Pathogen characterisation Human biomolecular data Pathogen characterisation	Tool info Training
FastQC Screen	FastQ Screen is a quality control tool used to detect contamination in sequencing data (FASTQ files).	Human biomolecular data
FastTree	FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences	Pathogen characterisation	Tool info Training
Federated EGA	The Federated EGA is an infrastructure built upon the European Genome-phenome Archive (EGA), an EMBL-EBI and CRG data resource for secure archiving and sharing of human sensitive biomolecular and phenotypic data resulting from biomedical research projects.	Human biomolecular data Human clinical and hea...	Training
Figshare	Figshare is a generalist, subject-agnostic repository for many different types of digital objects that can be used without cost to researchers. Data can be submitted to the central figshare repository (described here), or institutional repositories using the figshare software can be installed locally, e.g. by universities and publishers.	Human biomolecular data	Standards/Databases Training
Findata	Findata is the data permit authority for the social and health care sector in Finland.	Human clinical and hea...
FluServer	The main application scenario for FluSurver is to highlight phenotypically or epidemiologically interesting candidate mutations for further research and should ideally be combined with experimental testing and verification of any predicted phenotypes.	Pathogen characterisation
Flye	Flye is a de novo assembler for single-molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies.	Human biomolecular data Pathogen characterisation	Tool info Training
freebayes	freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs, indels, MNPs, and complex events smaller than the length of a short-read sequencing alignment.	Human biomolecular data Pathogen characterisation	Tool info Training
French Health Data Hub	French Health Data Hub that guarantees easy and unified, transparent and secure access to health data to improve the quality of care and patient support.	Human clinical and hea...
Freyja	Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference).	Pathogen characterisation	Tool info
g:Profiler	g:GOSt performs functional enrichment analysis, also known as over-representation analysis (ORA) or gene set enrichment analysis, on input gene list.	Pathogen characterisation	Tool info Training
Galaxy	Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.	Human biomolecular data Pathogen characterisation General guidelines Human clinical and hea... Using the ENA data sub...	Tool info Training
Galaxy Europe	The European Galaxy server. Provides access to thousands of tools for scalable and reproducible analysis.	Pathogen characterisation An automated SARS-CoV-...	Training
Galaxy University of Tartu	The University of Tartu Galaxy instance. Enables local university users to run their analyses in the Galaxy environment. Was heavily used during the KoroGenoEST sequencing studies.	SARS-CoV-2 sequencing ...
Ganon	ganon2 classifies DNA sequences against large sets of genomic reference sequences efficiently.	Pathogen characterisation	Tool info
GenBank	GenBank is the NIH genetic sequence database of annotated collections of all publicly available DNA sequences.	Human biomolecular data	Tool info Standards/Databases Training
GenCoF	Tool to remove human reads from pathogen sequencing data	Human biomolecular data
GeneMANIA	GeneMANIA helps you predict the function of your favourite genes and gene sets.	Human biomolecular data	Tool info Training
Geno2pheno	Estimating phenotypic drug resistance from HIV-1 genotypes associated with resistance to PRO and RT inhibitors	Pathogen characterisation	Tool info
Genome Analysis Toolkit (GATK)	GATK is a widely used tool for variant calling and genotyping from NGS data.	Human biomolecular data	Tool info
Genome Detective	Genome Detective offers intuitive Bio-Informatics applications for the analysis of microbial molecular sequence data.	Pathogen characterisation
Genomic Data Infrastructure (GDI)	The Genomic Data Infrastructure (GDI) project is enabling access to genomic and related phenotypic and clinical data across Europe. It is doing this by establishing a federated, sustainable and secure infrastructure to access the data. It builds on the outputs of the Beyond 1 Million Genomes (B1MG) project and is realising the ambition of the 1+Million Genomes (1+MG) initiative.
Genotype-Tissue Expression (GTEx)	The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 53 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images.	Human biomolecular data	Tool info Standards/Databases
GEO	The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. Accepts next generation sequence data that examine quantitative gene expression, gene regulation, epigenomics or other aspects of functional genomics using methods such as RNA-seq, miRNA-seq, ChIP-seq, RIP-seq, HiC-seq, methyl-seq, etc. GEO will process all components of your study, including the samples, project description, processed data files, and will submit the raw data files to the Sequence Read Archive (SRA) on the researchers behalf. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the studies and gene expression patterns stored in GEO.	Human biomolecular data	Standards/Databases Training
ggplot2	ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.	Human biomolecular data	Tool info Training
GitHub	GitHub is a versioning system, used for sharing code, as well as for sharing of small data.	Human biomolecular data Pathogen characterisation An automated pipeline ...	Standards/Databases Standards/Databases Training
GitLab	GitLab is an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Self-host GitLab on your own servers, in a container, or on a cloud provider.	Human biomolecular data Pathogen characterisation	Standards/Databases Training
Global Alliance for Genomics and Health (GA4GH)	The metadata model for GA4GH, an international coalition of both public and private interested parties, formed to enable the sharing of genomic and clinical data.	Human biomolecular data	Tool info Standards/Databases Training
Global Initiative on Sharing All Influenza Data (GISAID)	A web-based platform for sharing viral sequence data, initially for influenza data, and now for other pathogens (including SARS-CoV-2).	Pathogen characterisation Human clinical and hea... Pathogen characterisation	Standards/Databases
GO	GO is to perform enrichment analysis on gene sets.	Human biomolecular data Pathogen characterisation	Tool info Training
GRAF pop	GRAF pop is a software tool that infers the subject ancestry.	Human biomolecular data
GRAF sex	Tool that determines subject sexes using the genotypes.	Human biomolecular data
GRIDSS	GRIDSS is a module software suite containing tools useful for the detection of genomic rearrangements.	Human biomolecular data	Tool info
GSEA	Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states	Human biomolecular data	Tool info Training
Guppy	Guppy is a bioinformatics toolkit that enables real-time basecalling and several post-processing features that works on Oxford Nanopore Technologies™ sequencing platforms.	Pathogen characterisation	Tool info
hAMRonization	hAMRonization is a software tool to harmonize and standardize antimicrobial resistance (AMR) data generated by various bioinformatics tools.	Pathogen characterisation	Tool info
HCV-GLUE	HCV-GLUE is a bioinformatics resource for HCV sequence data.	Pathogen characterisation
Health Research Data UK	HDR UK is a national institute with the aim to unite the UK’s health and care data to enable discoveries that improve people’s lives. We do this by uniting, improving and using health and care data as one national institute.	Human clinical and hea...
hicap	The cap locus of H. influenzae are categorised into 6 different groups based on serology (a-f).	Pathogen characterisation
HISAT2	HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) to a population of human genomes (as well as to a single reference genome).	Human biomolecular data Pathogen characterisation	Tool info Training
IGV	The Integrative Genomics Viewer (IGV) is a high-performance, easy-to-use, interactive tool for the visual exploration of genomic data.	Human biomolecular data	Tool info Training
IntAct	IntAct (Molecular Interaction Database) Website	Human biomolecular data Pathogen characterisation	Tool info Standards/Databases Training
IQtree	IQ-TREE is designed to efficiently handle large phylogenomic datasets, utilize multicore and distributed parallel computing for faster analysis, and automatically resume interrupted analyses through checkpointing.	Pathogen characterisation	Tool info Training
IRMA	IRMA (Iterative Refinement Meta-Assembler) was designed for the robust assembly, variant calling, and phasing of highly variable RNA viruses.	Pathogen characterisation	Tool info
ISARIC COVID-19 Case Report Form	The ISARIC-WHO Case Report Forms (CRFs) should be used to collect data on individuals presenting with suspected or confirmed COVID-19, with the aim to standardise clinical data to improve patient care and inform the public health response.	Linked pathogen and ho...	Standards/Databases
iVar	iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.	Pathogen characterisation	Tool info Training
Kaiju	Kaiju is a program for the taxonomic classification of high-throughput sequencing reads, e.g., Illumina or Roche/454, from whole-genome sequencing of metagenomic DNA.	Pathogen characterisation	Tool info
Kallisto	Kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.	Pathogen characterisation	Tool info
KEGG	A set of annotation maps for Kyoto encyclopedia of genes and genomes (KEGG)	Human biomolecular data	Tool info Training
Kleborate	Kleborate was primarily developed to screen genome assemblies of Klebsiella pneumoniae and the Klebsiella pneumoniae species complex (KpSC).	Pathogen characterisation	Tool info
KMCP	KMCP is an accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping	Pathogen characterisation
KmerFinder	KmerFinder is a bioinformatics tool designed for the rapid identification of bacterial species and strains from whole genome sequencing (WGS) data.	Pathogen characterisation	Tool info
Kraken	Tool to classify the taxonomic origin of a read in pathogen sequencing data.	Human biomolecular data	Tool info Training
Kraken 2	A taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds.	Pathogen characterisation
KrakenUniq	False-positive identifications are a significant problem in metagenomics classification.	Pathogen characterisation	Tool info
Krona Tools	Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.	Pathogen characterisation	Tool info
legsta	In silico Legionella pneumophila Sequence Based Typing (SBT).	Pathogen characterisation	Tool info
LimeSurvey	LimeSurvey is a free and open source advanced online survey system to create online surveys.
Lineagespot	Lineagespot is a framework written in R, and aims to identify SARS-CoV-2 related mutations based on a single (or a list) of variant(s) file(s).	Pathogen characterisation	Tool info
LisSero	In silico serogroup typing prediction for Listeria monocytogenes	Pathogen characterisation
LoFreq*	LoFreq* (i.e. LoFreq version 2) is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data	Pathogen characterisation	Training
LongQC	LongQC is a tool for the data quality control of the PacBio and ONT long reads, and it has two functionalities: sample qc and platform qc.	Pathogen characterisation
Lumpy	A probabilistic framework for structural variant discovery.	Human biomolecular data	Tool info
MACS	Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites.	Human biomolecular data	Tool info Training
MAFFT	MAFFT is a multiple sequence alignment program	Human biomolecular data Pathogen characterisation	Tool info Training
Manta	Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads.	Human biomolecular data	Tool info
matplotlib	Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.	Human biomolecular data Socioeconomic data	Tool info Training
MAXQUANT	MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data.	Pathogen characterisation	Tool info Training
Medaka	medaka is a tool to create consensus sequences and variant calls from nanopore sequencing data.	Pathogen characterisation	Tool info Standards/Databases Training
MEGAHIT	MEGAHIT is an ultra-fast and memory-efficient NGS assembler optimized for metagenomes.	Pathogen characterisation	Tool info Training
MEGAN	MEGAN is used fot the interactive exploration and analysis of large-scale microbiome sequencing data.	Pathogen characterisation	Tool info
meningotype	In silico typing of Neisseria meningitidis contigs.	Pathogen characterisation	Tool info
MetaboAnalyst	MetaboAnalyst is a comprehensive platform dedicated for metabolomics data analysis via user-friendly, web-based interface.	Human biomolecular data Pathogen characterisation	Tool info Training
Metagen-FastQC	Cleans metagenomic reads to remove adapters, low-quality bases and host (e.g. human) contamination.	Using the ENA data sub... SARS-CoV-2 sequencing ...
MetaGeniE	Tool to remove human reads from pathogen sequencing data.	Human biomolecular data
MetaPhlAn	MetaPhlAn is a computational tool for species-level microbial profiling (bacteria, archaea, eukaryotes, and viruses) from metagenomic shotgun sequencing data.	Pathogen characterisation	Tool info Training
MethylKit	methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing.	Human biomolecular data	Tool info
methylPipe	Base resolution DNA methylation data analysis	Human biomolecular data	Tool info
MetSign	A computational platform for high-resolution mass spectrometry-based metabolomics	Human biomolecular data
MIABIS	MIABIS represents the minimum information required to initiate collaborations between biobanks and to enable the exchange of biological samples and data. The aim is to facilitate the reuse of bio-resources and associated data by harmonizing biobanking and biomedical research.	Human biomolecular data	Standards/Databases
mice	Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>.	Socioeconomic data	Training
Miniasm	Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format.	Pathogen characterisation	Tool info Training
Minimap2	Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database.	Pathogen characterisation	Tool info Training
MinIONQC	Fast and effective quality control for MinION and PromethION sequencing data	Pathogen characterisation
missingno	missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset.	Socioeconomic data
missMethods	Supply functions for the creation and handling of missing data as well as tools to evaluate missing data methods	Socioeconomic data
mOTUs	The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.	Pathogen characterisation	Tool info Training
Mouse Brain Alignment Tool (MBAT)	Semi-automated processing of autoradiography (ARG) images from mouse brain tissue. This project includes a Napari-based user interface where ARG slides can be preprocessed and registered to Allen Brain Atlas regions.	An automated pipeline ...
MrBayes	MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.	Pathogen characterisation	Tool info
MTBseq	MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from Illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.	Pathogen characterisation
MultiQC	MultiQC searches a given directory for analysis logs and compiles a HTML report.	Pathogen characterisation Pathogen characterisation	Tool info Training
MUSCLE	MUSCLE is widely-used software for making multiple alignments of biological sequences.	Human biomolecular data Pathogen characterisation	Tool info Training
Mzmine	MZmine 3 is an open-source software for mass-spectrometry data processing, with the main focus on LC-MS data.	Human biomolecular data Pathogen characterisation	Tool info
naniar	The package naniar is used for exploring missing data structures with minimal deviation from the common workflows of ggplot and tidy data (Wickham, 2014, Wickham, 2009).	Socioeconomic data
NanoCaller	NanoCaller is a computational method that integrates long reads in deep convolutional neural network for the detection of SNPs/indels from long-read sequencing data.	Pathogen characterisation	Tool info
Nanofilt	Filtering and trimming of long read sequencing data.	Pathogen characterisation
NanoPlot	NanoPlot is a plotting tool for long read sequencing data and alignments.	Pathogen characterisation	Tool info Training
Nanopolish	Software package for signal-level analysis of Oxford Nanopore sequencing data.	Pathogen characterisation	Tool info
nanoq	nanoq is an ultra-fast quality control and summary reports for nanopore reads	Pathogen characterisation	Training
National Center for Biotechnology Information (NCBI)	The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information.	Human biomolecular data	Training
Nextclade	Nextclade is a tool for viral genome clade assignment, mutation calling, and sequence quality checks.	Pathogen characterisation	Tool info Training
Nextflow	Nextflow is a framework for data analysis workflow execution	Human biomolecular data Pathogen characterisation	Tool info Training
Nextstrain	Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data.	Pathogen characterisation	Tool info Standards/Databases Training
Nextstrain Auspice	Estonian local instance of the Nextstrain Auspice application that serves SARS-CoV-2 phylogenetic data	SARS-CoV-2 sequencing ...
ngmaster	In silico multi-antigen sequence typing for Neisseria gonorrhoeae (NG-MAST) and Neisseria gonorrhoeae sequence typing for antimicrobial resistance (NG-STAR).	Pathogen characterisation
noWorkFlow	The noWorkflow project aims at allowing scientists to benefit from provenance data analysis even when they don't use a workflow system.	General guidelines
NumPy	Python library for scientific computing.	Socioeconomic data Socioeconomic data	Tool info Training
NUTS	The NUTS (Nomenclature of territorial units for statistics) classification, developed by eurostat, is a hierarchical system for dividing up the economic territory of the EU and the UK.		Training
ODDISEI Secure ANalysis Environment (SANE)	SANE is a virtual container in which the researcher can analyse sensitive data, while the data owner retains full control.	Socioeconomic data
OHDSI	The R package will perform data quality checks against an OMOP CDM instance.	Human clinical and hea...	Tool info
Omicsgenerator	Omics Integrator is a package designed to integrate proteomic data, gene expression data and/or epigenetic data using a protein-protein interaction network.	Human biomolecular data
OMSSA	OMSSA (Open Mass Spectrometry Search Algorithm) is a tool to identify peptides in tandem mass spectrometry (MS/MS) data. The OMSSA algorithm uses a classic probability score to compute specificity. See also The NCBI C++ Toolkit and The NCBI C++ Toolkit Book.	Pathogen characterisation	Tool info
Ontology Lookup Service (OLS)	EMBL-EBI's web portal for finding ontologies	FAIR data Human clinical and hea...	Tool info Standards/Databases Training
OpenBEL	The OpenBEL Framework is an open-platform technology for managing, publishing, and using biological knowledge represented using the Biological Expression Language (BEL).	Knowledge Graph Genera...
OpenMS	OpenMS is an open-source software C++ library for LC-MS data management and analyses.	Human biomolecular data Pathogen characterisation	Tool info Training
OpenProvenance	Set of user-friendly web applications for storing, validating, and translating W3C PROV-based provenance representations.	General guidelines
OpenRefine	Powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.	Socioeconomic data	Training
outliers	A collection of some tests commonly used for identifying outliers.	Socioeconomic data	Training
pandas	Open source data analysis and manipulation tool, built on top of the Python programming language.	Socioeconomic data Socioeconomic data	Tool info Training
Pangolin	Pangolin is a tool for the Phylogenetic Assignment of Named Global Outbreak LINeages	Pathogen characterisation	Tool info Training
Panther	The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence.	Human biomolecular data	Tool info Standards/Databases Training
pasty	A tool easily taken advantage of for in silico serogrouping of Pseudomonas aeruginosa isolates.	Pathogen characterisation
Pathogens Portal	The Pathogens Portal, launched in July 2023, is an invaluable resource for researchers, clinicians, and policymakers who need access to the latest and most comprehensive datasets on pathogens. The portal is a collaborative effort between the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) and partners.	Pathogen characterisation Linked pathogen and ho... The Swedish Pathogens ...	Standards/Databases Training
Pathogens Portal Cohort Browser	The Pathogens Portal Cohort Browser presents discovery metadata of infectious disease cohort datasets and provides links to the associated datasets within ELIXIR Core Data Resources; search and filtering functionalities enable users to identify cohort studies of interest in a convenient manner.	Linked pathogen and ho...
Pathogenwatch	Pathogenwatch provides species and taxonomy prediction for over 60,000 variants of bacteria, viruses, and fungi.	Pathogen characterisation
pbptyper	In silico Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae assemblies	Pathogen characterisation
PEDDY	Sample quality control. Infer family connections thus revealing any sample mixup during sample collection from families.	Human biomolecular data
PepArMl	A Meta-Search Peptide Identification Platform for Tandem Mass Spectra	Pathogen characterisation	Tool info
PHES-ODM	A data model to improve wastewater surveillance through interoperable data.	Pathogen characterisation
PhyML	PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework.	Human biomolecular data	Tool info
Picard	Picard is a suite of tools that provides quality control and processing of NGS data, including duplicate read removal, format conversion, and alignment.	Pathogen characterisation Human biomolecular data	Tool info
PiGx SARS-CoV-2 Wastewater Sequencing Pipeline	PiGx SARS-CoV-2 is a pipeline for analysing data from sequenced wastewater samples and identifying given lineages of SARS-CoV-2.	Pathogen characterisation
Pilon	Pilon is a software tool which can be used to automatically improve draft assemblies and find variation among strains, including large event detection.	Pathogen characterisation	Tool info Training
PlasmidID	PlasmidID is a mapping-based, assembly-assisted plasmid identification tool that analyzes and gives graphic solution for plasmid identification.	Pathogen characterisation	Tool info
Polypolish	Polypolish is a tool for polishing genome assemblies with short reads.	Pathogen characterisation	Tool info Training
Population Health Information Research Infrastructure (PHIRI)	PHIRI is the roll-out of the research infrastructure on population health information that aims to facilitate and generate the best available evidence for research on health and well-being of populations as impacted by COVID-19.	Human clinical and hea...
Porechop	Porechop is a tool for finding and removing adapters from Oxford Nanopore reads.	Pathogen characterisation
preseq	The preseq package is aimed at predicting the yield of distinct reads from a genomic library from an initial sequencing experiment.	Pathogen characterisation	Tool info
Prokka	Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.	Pathogen characterisation	Tool info Training
Prov python	Python implementation of the PROV data model.	General guidelines	Training
Provenance storage	A prototype of a provenance management service implementing the CPM (ISO 23494-2).	General guidelines
provR	Collect meta-data from scripts written in the R programming language.	General guidelines
ProvToolbox	Java implementation of the PROV data model.	General guidelines
PyBEL	Pure Python package for parsing and handling biological networks encoded in the Biological Expression Language (BEL).	Knowledge Graph Genera...	Tool info
PycoQC	PycoQC computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data	Pathogen characterisation	Tool info Training
pyod	A comprehensive but easy-to-use Python library for detecting anomalies in multivariate data.	Socioeconomic data
QIIME 2	QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency.	Pathogen characterisation	Tool info Training
Qualimap	Qualimap is a quality control tool that assesses the quality of the sequencing data at different stages of the analysis pipeline, including read mapping, coverage, and expression analysis.	Pathogen characterisation Human biomolecular data	Tool info Training
Quarto	Quarto is an open-source scientific and technical publishing system that enables the creation of dynamic and reproducible content.	Prototyping federated ...	Training
Quast	QUAST stands for QUality ASsessment Tool. It evaluates genome/metagenome assemblies by computing various metrics.	Pathogen characterisation	Tool info Training
R	Free software environment for statistical computing and graphics.		Tool info Training
R Markdown	R Markdown can help to turn your analyses into high quality documents, reports, presentations and dashboards.		Training
R Shiny	Shiny is an R package that makes it easy to build interactive web apps straight from R.		Training
Racon	Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step.	Pathogen characterisation	Tool info Training
Raven	Raven is a de novo genome assembler for long uncorrected reads.	Pathogen characterisation	Tool info
RAxML	A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies	Pathogen characterisation	Tool info Training
RDataTracker	An R library to collect provenance from R scripts.	General guidelines
ReAdW	Convert ThermoFinningan RAW mass spectrometry files to the mzXML format.	Pathogen characterisation	Tool info
recordr	Provenance tracking for R	General guidelines
Research Data Centre at the BfArM	Research at the BfArM concentrates on important and contemporary research focal points with regard to the marketing authorisation of medicinal products and improving the safety thereof as well as concerning the recording and assessment of risks in connection with medical devices.	Human clinical and hea...
Research Object Crate (RO-Crate)	RO-Crate is a lightweight approach to packaging research data with their metadata, using schema.org. An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations.	General guidelines	Standards/Databases Training
ResFinder	ResFinder identifies acquired genes and/or finds chromosomal mutations mediating antimicrobial resistance in total or partial DNA sequence of bacteria.	Pathogen characterisation	Tool info
RSEM	RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data.	Pathogen characterisation	Tool info
RSeQC	RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.	Pathogen characterisation	Tool info Training
Salmon	Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data	Pathogen characterisation	Tool info Training
SAMtools	SAMtools is a suite of programs for interacting with high-throughput sequencing data.	Pathogen characterisation Human biomolecular data Pathogen characterisation	Tool info Training
Sapporo WES	Implementation of Workflow Execution Service (WES) or so-called Workflow-as-a-Service.	General guidelines Human clinical and hea...	Tool info
SARS-CoV-2 Contextual Data Specification	A SARS-CoV-2 Contextual Data Specification from PHA4GE.	Pathogen characterisation
SARS-CoV-2 Data Hubs	Using technology that builds upon existing EMBL-EBI infrastructure, we provide SARS-CoV-2 Data Hubs to those public health agencies and other scientific groups responsible for generating viral sequence data from the outbreak at national or regional levels.	Human biomolecular data
SARS-COV-2 outbreak in Andalucia	SARS-CoV-2 whole genome sequencing circuit of Andalusia	Human biomolecular data
sccmec	sccmec is a tool for typing SCCmec cassettes in assemblies.	Pathogen characterisation
Schema.org	Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond.	Human clinical and hea... General guidelines	Standards/Databases Training
Scikit-learn	Machine learning tools in Python	Socioeconomic data Socioeconomic data	Tool info Training
SDMX Registry	SDMX Fusion Metadata Registry	Socioeconomic data
SDMX-Reference Infrastructure (SDMX-RI)	The SDMX-Reference Infrastructure (SDMX-RI) is a set of pick-and-choose building blocks and tools that allow data to be exposed to the external world through access rights by using web services.	Socioeconomic data
seaborn	Python data visualization library that provides a high-level interface for drawing attractive and informative statistical graphics.	Socioeconomic data	Training
SeqSero2	Salmonella serotype prediction from genome sequencing data.	Pathogen characterisation	Tool info
ShigaTyper	ShigaTyper is a quick and easy tool designed to determine Shigella serotype using Illumina (single or paired-end) or Oxford Nanopore reads with low computation requirement.	Pathogen characterisation
ShigEiFinder	This is a tool that is used to identify differentiate Shigella/EIEC using cluster-specific genes and identify the serotype using O-antigen/H-antigen genes.	Pathogen characterisation	Tool info
SICER2	Redesigned and improved ChIP-seq broad peak calling tool SICER	Human biomolecular data
Singularity	Singularity is a widely-adopted container runtime that implements a unique security model to mitigate privilege escalation risks and provides a platform to capture a complete application environment into a single file (SIF)	Human biomolecular data	Training
SISTR	Salmonella serovar predictions from whole-genome sequence assemblies by determination of antigen gene and cgMLST gene alleles using BLAST.	Pathogen characterisation	Tool info
Snakemake	Snakemake is a framework for data analysis workflow execution	Human biomolecular data Pathogen characterisation General guidelines Human clinical and hea...	Tool info Training
SNippy	Rapid haploid variant calling and core genome alignment.	Pathogen characterisation	Tool info Training
SnpEff	Genetic variant annotation and functional effect prediction toolbox. It annotates and predicts the effects of genetic variants on genes and proteins.	Human biomolecular data Pathogen characterisation	Tool info Training
SnpSift	SnpSift annotates genomic variants using databases, filters, and manipulates genomic annotated variants.	Pathogen characterisation	Tool info
SOAPnuke	A novel analysis tool developed for quality control and preprocessing of FASTQ and SAM/BAM data.	Pathogen characterisation	Tool info
SortMeRNA	SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.	Pathogen characterisation	Tool info Training
SPAdes	SPAdes is an assembly toolkit containing various assembly pipelines.	Human biomolecular data Pathogen characterisation	Tool info Training
spaTyper	Given a fasta file or multiple fasta files, identifies the repeats and the order and generates a spa type.	Pathogen characterisation
SRA	The SRA is NIH's primary archive of high-throughput sequencing data and is part of the International Nucleotide Sequence Database Collaboration (INSDC) that includes at the NCBI Sequence Read Archive (SRA), the European Bioinformatics Institute (EBI), and the DNA Database of Japan (DDBJ). Data submitted to any of the three organizations are shared among them. SRA accepts data from all kinds of sequencing projects including clinically important studies that involve human subjects or their metagenomes, which may contain human sequences. These data often have a controlled access via dbGaP (the database of Genotypes and Phenotypes).	Human biomolecular data	Tool info Standards/Databases Training
SsuisSero	This pipeline is designed to rapidly infer Streptococcus suis serotype from Oxford Nanopore data by first assemblying a draft genome using Flye followed by genome polishing with racon and medaka.	Pathogen characterisation
Stanford HIV Drug Resistance Database (HIVDB)	A curated database containing nearly all published HIV RT and protease sequences: a resource designed for researchers studying evolutionary and drug-related variation in the molecular targets of anti-HIV therapy.	Pathogen characterisation
STAR	Spliced Transcripts Alignment to a Reference	Human biomolecular data Pathogen characterisation	Tool info Training
StreamFlow	Container-native workflow manager for hybrid infrastructures	General guidelines Human clinical and hea...
Swedish Pathogens Portal	The Swedish Pathogens Portal was previously known as the Swedish COVID-19 Data Portal. It is the Swedish national node of the Pathogens Portal, aimed at facilitating the sharing of data related to pathogens and pandemic preparedness.	Pathogen characterisation The Swedish Pathogens ...	Standards/Databases
TBProfiler	TBProfiler can rapidly and accurately predict anti-TB drug resistance profiles across large numbers of samples with WGS data.	Pathogen characterisation
The Cancer Genome Atlas (TCGA)	The Cancer Genome Atlas (TCGA) is a comprehensive, collaborative effort led by the National Institutes of Health (NIH) to map the genomic changes associated with specific types of tumors to improve the prevention, diagnosis and treatment of cancer. Its mission is to accelerate the understanding of the molecular basis of cancer through the application of genome analysis and characterization technologies.	Human biomolecular data	Standards/Databases Training
The Data Use Ontology (DUO)	The Data Use Ontology (DUO) describes data use requirements and limitations. DUO allows to semantically tag datasets with restriction about their usage, making them discoverable automatically based on the authorization level of users, or intended usage. This resource is based on the OBO Foundry principles, and developed using the W3C Web Ontology Language. It is used in production by the European Genome-phenome Archive (EGA) at EMBL-EBI and CRG as well as the Broad Institute for the Data Use Oversight System (DUOS).	FAIR data Human biomolecular data	Standards/Databases
The European Health Information Gateway	The European Health Information Gateway is a platform that provides access to various health information resources and datasets from across Europe, including data on health systems, health determinants, and health outcomes.	Human clinical and hea...
The National health service metadata catalogue	Tool to find health data in the metadata catalogue in Portugal.	Human clinical and hea...
The Open Biological and Biomedical Ontology (OBO) Foundry	Collaborative effort to develob interoperable ontologies for the biological sciences	Human clinical and hea...	Standards/Databases
The Public Service Data Catalogue	Tool to discover the Data held by the Irish Public Service	Human clinical and hea...
tidyr	Tidy data describes a standard way of storing data that is used wherever possible throughout the [tidyverse](https://www.tidyverse.org/).	Socioeconomic data
toil-cwl-runner	The toil-cwl-runner command provides cwl-parsing functionality using cwltool, and leverages the job-scheduling and batch system support of Toil.	Human biomolecular data
Trifacta	Trifacta is designed for analysts to explore, transform, and enrich raw data into clean and structured formats.	Socioeconomic data
Trimmomatic	Trimmomatic is a tool used for the removal of adapter sequences, low-quality reads, and sequences with ambiguous bases from NGS data.	Pathogen characterisation Human biomolecular data	Tool info Training
UCSC Genome Browser	An online tool for analyzing and visualizing genomic data. It allows users to add and share annotations.	Human biomolecular data An automated SARS-CoV-...	Tool info Standards/Databases Training
Unicycler	Unicycler is an assembly pipeline for bacterial genomes. For the best possible assemblies, give it both Illumina reads and long reads, and it will conduct a short-read-first hybrid assembly.	Pathogen characterisation	Tool info Training
UNottingham Beacon	Beacon from UNottingham to query a backend OMOP database of synthetic COVID-19 patient EHRs (electronic health records).	Human biomolecular data
validate R package	Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results.	Socioeconomic data
VarScan	Variant calling and somatic mutation/CNV detection for next-generation sequencing data	Human biomolecular data	Tool info Training
VCFtools	VCFtools is a program package designed for working with VCF files.	Pathogen characterisation	Tool info
Velvet	Velvet is an algorithm package that has been designed to deal with de novo genome assembly and short read sequencing alignments.	Pathogen characterisation	Tool info Training
VEP	VEP (Variant Effect Predictor) predicts the functional effects of genomic variants.	Human biomolecular data Pathogen characterisation	Tool info Training
Viral AI	A global network for genomic surveillance and infectious disease research	Human biomolecular data
VLQ	A pipeline for lineage abundance estimation from wastewater sequencing data.	Pathogen characterisation
Webin-CLI	Command line application to submit assemblies and transcriptomes to ENA.	Using the ENA data sub...	Training
WfExS	Workflow Execution Service Backend (WfExS-backend) is a high-level orchestrator to run scientific workflows reproducibly.	General guidelines
WorkflowHub	A registry for describing, sharing and publishing scientific computational workflows.	Pathogen characterisation An automated SARS-CoV-...	Tool info Standards/Databases Training
wtdbg2	Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT).	Human biomolecular data	Tool info
X! Tandem	X! Tandem open source is software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.	Pathogen characterisation
xcms	Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data.	Pathogen characterisation	Tool info Training
XCMS Online	A systems biology tool for analyzing metabolomic data. It automatically superimposes raw metabolomic data onto metabolic pathways and integrates it with transcriptomic and proteomic data.	Human biomolecular data	Tool info
Zenodo	Zenodo is a generalist research data repository built and developed by OpenAIRE and CERN.	FAIR data Human biomolecular data	Standards/Databases Training
Epidemiology of Infectious Diseases (Epistat)	A web based application for visualising and exploring data on infectious diseases monitored by Sciensano.
COVID Epistat	Dashboard for COVID-19 epidemiological data (vaccination, laboratory testing, wastewater, variants, hospitalised patients, mortality, nursing home patients, mental health indicators, seroprevalence).
Sciensano R Shiny Apps	Shiny is an R package that makes it easy to build interactive web apps straight from R. At Sciensano, several Shiny Apps have been developed to process, analyse and visualise data during the COVID-19 crisis. These include the Surge App (monitoring and data quality of COVID-19 hospitalisations), Indicator App (COVID-19 indicators based on test positivity rates per province), Coverage App (coverage of clinical database on hospitalised patients), Hospital Indicators (forecasting and profile of hospitalised patients), and Quality of reporting (quality indicators of reporting by individual hospitals). R Shiny
R Markdown	Tool at Sciensano for generating weekly and daily reports.
Sciensano LimeSurvey	Tool for online surveys. LimeSurvey
DMPonline.be	Tool that provides templates for data management plans.
Figures on notifiable infectious diseases	Dashboard for notifiable infectious diseases in Flanders.
Galaxy Belgium	Galaxy Belgium is a Galaxy instance managed by the Belgian ELIXIR node, funded by the Flemish government, which utilises infrastructure provided by the Flemish Supercomputer Center (VSC). Galaxy
COVID-NL clinical data dashboard	The Dutch national COVID-19 clinical data dashboard allows exploration and reuse of clinical data from Dutch university medical centers (UMCs). The dashboard provides researchers with a clear overview of what is available, allows searching for specific data and makes access to such data easier when the necessary ethical and legal conditions have been met. The policy for access to and sharing of clinical COVID-19 is described in the HRI COVID policy document.		Standards/Databases
COVID-NL metadata portal	The Dutch national COVID-19 metadata portal describes the content of the collections and type of data. The underlying data remains at the source, but where possible a link to the data or the data request procedure are provided on the portal. The first health care data sets in the portal are coming from observational studies funded by ZonMw, NFU COVID-19 clinical research data, collaborating top clinical hospitals (STZ), as well as other regional hospitals. However, the portal is open to any health care provider wishing to make their COVID-19 data available for research.		Standards/Databases
ODISSEI	The Open Data Infrastructure for Social Science and Economic Innovations (ODISSEI) is the national research infrastructure for the social sciences in the Netherlands.		Standards/Databases
DANS Data Station Life Sciences	The Data Archiving and Networked Services (DANS) is the Dutch national centre of expertise and repository for research data. This data station allows you to deposit and search for data within the fields of medical, health and green life sciences.		Standards/Databases
DANS Data Station Life Sciences	The Data Archiving and Networked Services (DANS) is the Dutch national centre of expertise and repository for research data. This data station allows you to deposit and search for data within the social sciences and humanities.
ELSI servicedesk	The ELSI Servicedesk provides guidance and answers to the ethical, legal and social implications of research on personalised medicine and next generation sequencing that life science professionals, policymakers and patients are faced with.
SARS-CoV-2 Database	Norwegian SARS-CoV-2 database		Standards/Databases
Folkehelseinstituttet (FHI)	Norwegian Institute of Public Health (NIPH) portal for infectious disease information
FEGA Norway	Federated European Genome-phenome Archive (EGA) node European Genome-phenome Archive (EGA)
Swedish Pathogens Portal	The Swedish Pathogens Portal is a hub for data, tools, services, and other resources centred around pathogens, such as SARS-CoV-2, and pandemic preparedness in Sweden.
Swiss Pathogen Surveillance Platform (SPSP)	SPSP is a secure One-health online platform that enables near real-time sharing under controlled access of pathogen genomic data and their associated clinical/epidemiological metadata. During COVID-19, it served as the Swiss SARS-CoV-2 genomic data hub, collecting data, annotating it, communicating reports to the federal public health authorities and openly re-sharing anonymised data on the Covid-19 Data Platform.	Pathogen characterisation Pathogen characterisation Pathogen characterisation
Research Data Management (RDM) sources in Switzerland	RDMkit page on Switzerland’s RDM guidelines and resources.
COVID-19 Data Portal aggregation of Swiss COVID-19 data	COVID-19 Data Portal aggregation of Swiss COVID-19 data
V-pipe	V-pipe is the bioinformatics pipeline that integrates various open-source software packages for assessing viral genetic diversity from next-generation sequencing (NGS) data derived from intra-host virus populations	Pathogen characterisation
ViralZone	ViralZone is a SIB Swiss Institute of Bioinformatics web-resource for all viral genus and families, providing general molecular and epidemiological information, along with virion and genome figures. Each virus or family page gives an easy access to UniProtKB/Swiss-Prot viral protein entries	Pathogen characterisation
Nextstrain	Nextstrain is an open-source project to harness the scientific and public health potential of pathogen genome data. It provides a continually-updated view of publicly available data alongside powerful analytic and visualization tools for use by the community. The goal is to aid epidemiological understanding and improve outbreak response	Pathogen characterisation
CoV Spectrum	CoV-Spectrum is an interactive tool to analyze and discover variants of SARS-CoV-2. Main features include a powerful search engine that supports amino acid and nucleotide mutation filtering, the comparison of multiple variants, and a built-in fitness advantage estimation model.	Pathogen characterisation
Covariants	CoVariants provides an overview of SARS-CoV-2 variants and mutations that are of interest. It displays what mutations define a variant, what impact they might have (with links to papers and resources), where variants are found, and link the variants in Nextstrain.	Pathogen characterisation
COVTriage	COVTriage is a search engine developed as part of SIBiLS (Swiss Institute of Bioinformatics Literature Services), which purpose is to rank the COVID-19 literature (Medline, PMC, Cord-19) according to the 9 axes of the COVoc ontology (controlled vocabulary to support literature triage for COVID-19). This resource supports COVID-19 / SARS-CoV-2 research.	Pathogen characterisation
Computational Linguistics for COVID-19	To process COVID-19-related scientific publications automatically to detect mentions of domain-specific entities of particular relevance (such as genes, symptoms, drugs, organs, etc.). To enhance accessibility to the literature, for example, simplifying the search of papers dealing with a particular gene or identifying unexpected connections between different entities.	Pathogen characterisation