WGS Masterclass: Variant Calling & Genome Assembly Pipelines- recorded course

Master the architecture of end-to-end WGS workflows for de novo assembly and precision variant discovery. Harness AI-accelerated computational tools to transform raw genomic reads into high-fidelity mapped genomes.

Webinar Self Paced All Levels Dr. Omics
Language English
Level All Levels
Updated Jun 2026
WGS Masterclass: Variant Calling & Genome Assembly Pipelines- recorded course

Course Description

As sequencing costs plummet, the challenge has shifted from data generation to the sophisticated analysis of Whole Genome Sequencing (WGS) data. This Masterclass provides an intensive deep dive into the computational strategies required for both de novo genome assembly and reference-based variant calling. You will explore the technical nuances of handling Short-Read (Illumina) and Long-Read (PacBio/Oxford Nanopore) data, utilizing AI-optimized assemblers to resolve complex repetitive regions. The curriculum covers the entire bioinformatic stack—from raw signal processing and basecalling to the identification of Structural Variants (SVs) and Copy Number Variations (CNVs). Participants will gain hands-on experience with Machine Learning models that minimize false positives in clinical diagnostics. By mastering automated pipeline orchestration on cloud-native platforms, you will learn to scale your research to population-level genomics. This course is the ultimate guide for researchers aiming to leverage the full power of the genome to uncover the genetic basis of complex diseases and evolutionary history.

What You'll Learn

The fundamental differences between Short-Read and Long-Read WGS technologies.

Advanced De Novo Assembly strategies using graph-based AI algorithms.

High-precision Variant Calling workflows for SNPs, Indels, and Structural Variants.

How to use Deep Learning-based callers like DeepVariant for superior accuracy.

Techniques for Genome Polishing and scaffolding to achieve chromosomal-level assemblies.

Managing large-scale genomic data using Parallel Processing and Cloud HPC tools.

Curriculum

  • Module 1: Foundations of WGS & High-Throughput Linux Computing
    Lesson
  • Overview of Whole Genome Sequencing (WGS) architectures, platforms (Illumina, PacBio, Oxford Nanopore), and file standards.
    Lesson
  • Setting up a standard bioinformatic Conda environment and working inside the Linux terminal.
    Lesson
  • Managing high-throughput raw sequence files: understanding compressed FASTQ scoring models and metadata headers.
    Lesson
  • Module 2: Raw Read Quality Control & Alignment Operations
    Lesson
  • High-throughput quality control: generating read metrics with FastQC and compiling batch reports via MultiQC.
    Lesson
  • Automated adapter trimming, low-quality base clipping, and sequence cleaning with Trimmomatic.
    Lesson
  • Indexing reference genomes and executing high-fidelity read alignment using the BWA-MEM algorithm.
    Lesson
  • Post-alignment processing: sorting BAM files, adding read groups, and marking optical/PCR duplicates with Picard.
    Lesson
  • Module 3: GATK4 Variant Calling Pipeline
    Lesson
  • Base Quality Score Recalibration (BQSR): modeling systematic errors using known variant sites (dbSNP, 1000Genomes).
    Lesson
  • Variant Discovery: running GATK4 HaplotypeCaller to generate individual genomic VCF (gVCF) records.
    Lesson
  • Joint Genotyping and variant quality filtering: implementing hard filtering thresholds vs. machine-learning-based VQSR.
    Lesson
  • Parsing, slicing, and manipulation of the Variant Call Format (VCF) using BCFtools and VCFtools.
    Lesson
  • Module 4: Genome Assembly & Downstream Functional Annotation
    Lesson
  • Core concepts of de novo genome assembly: parsing Overlap-Layout-Consensus (OLC) and De Bruijn graph mechanics.
    Lesson
  • Executing short-read and hybrid genome assemblies using software platforms like SPAdes and Unicycler.
    Lesson
  • Assembly validation benchmarks: interpreting structural completeness via QUAST metrics (N50, L50, contig counts).
    Lesson
  • Automated genome annotation: predicting genes, tRNA features, and metabolic tracks using Bakta and Prokka pipelines.
    Lesson
WhatsApp