School of Data & AIData EngineeringBeginner to IntermediateIncluded in a Professional Diploma

Python for Data Engineering

Build the Python skills behind practical data pipelines and automation.

Learn how to use Python to extract data, process files, call APIs, validate records, connect to databases, automate workflows, and prepare data for analytics and engineering systems.

Duration

7 weeks - 6-8 hours/week

Project

Write Python scripts for data engineering tasks.

Support

Pricing and enrolment are handled through the Professional Diploma

Overview

A practical Short Course built around a visible project.

Learn the Python skills data engineers use to move, clean, transform, validate, and automate data across files, APIs, databases, and pipeline workflows.

Write Python scripts for data engineering tasks.

Read, write, and process CSV, JSON, Excel, and structured data files.

Extract data from APIs and external sources.

Clean, transform, and validate data with Python.

Connect Python scripts to databases.

Handle errors, logs, and failed data processes more professionally.

Build reusable data processing functions.

Automate repetitive data movement and preparation tasks.

Prepare data for pipelines, warehouses, and analytics systems.

Build portfolio-ready Python data engineering projects.

Course roadmap

What you will work through.

The sequence below is specific to this course. It shows the phases, modules, lessons, and page outlines that move you toward Write Python scripts for data engineering tasks..

1Phase 1 - Python Foundations for Data EngineeringBuild Python foundations specifically for data engineering: pipeline mindset, environment setup, scripts, syntax, control flow, reusable logic, and error handling.2 modules9 lessons1–2 weeks
Module 1: Python for Data Engineering MindsetUnderstand Python's role in data engineering and set up a professional workspace for script-based workflows.4 lessons
Lesson 1: What Python Does in Data EngineeringUnderstand Python as the glue language for extraction, transformation, validation, automation, and pipeline reliability.85 minarticle5 pages

Welcome and Learning Objectives

Introduce Python's role in data engineering.

8 min

Python as Pipeline Glue

Explain why Python is used in data workflows.

18 min

Python vs SQL, BI, dbt, Airflow and Warehouses

Clarify tool responsibilities.

22 min

Where Python Fits in Analytics, AI and Data Platforms

Connect Python to later data engineering path courses.

18 min

Exercise - Workflow Tool Decision Matrix

Students decide which tools should handle parts of a workflow.

19 min

Lesson 2: Development Environment SetupSet up a professional Python data engineering workspace using Python, VS Code, terminal, virtual environments, pip, requirements files, and project folders.85 minarticle5 pages

Welcome and Learning Objectives

Introduce environment setup.

8 min

Python, VS Code and Terminal Basics

Explain the core tools.

20 min

Virtual Environments and Requirements

Explain dependency isolation.

20 min

Project Folder Structure

Introduce a simple data engineering layout.

18 min

Exercise - Python Data Engineering Workspace Setup

Students set up their workspace.

19 min

Lesson 3: Running Python ProgramsRun Python scripts from the terminal, understand command-line inputs, distinguish notebooks from scripts, and build a simple input-output program.85 minarticle5 pages

Welcome and Learning Objectives

Introduce script execution.

8 min

Scripts vs Notebooks

Explain when to use scripts and notebooks.

18 min

Running Scripts from Terminal

Teach basic execution flow.

20 min

Command-Line Inputs

Introduce input arguments conceptually.

18 min

Exercise - Input Output Script

Students create and run a simple program.

21 min

Lesson 4: Python Syntax EssentialsLearn variables, data types, strings, numbers, booleans, comments, naming conventions, and constants for pipeline configuration.80 minarticle4 pages

Welcome and Learning Objectives

Introduce syntax essentials.

8 min

Variables and Data Types

Explain basic Python values in data engineering context.

20 min

Comments, Naming and Constants

Teach readable syntax habits.

18 min

Exercise - Pipeline Configuration Variables

Students create simple configuration variables.

34 min

Module 2: Control Flow and Reusable LogicUse conditions, loops, functions, and error handling to build reusable pipeline logic.5 lessons
Lesson 1: Conditions for Data RulesUse if, elif, else, comparison operators, validation rules, and branching logic for record classification.55 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

27 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

20 min

Lesson 2: Loops for Batch ProcessingUse for loops, while loops, file loops, record loops, and avoid inefficient loops.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 3: Functions for Pipeline LogicDesign reusable functions using parameters, returns, pure functions, side effects, and helpers.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: Error HandlingUse try/except, common data errors, safe failure, error messages, fail-fast vs continue-safely strategies.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 5: Mini Project 1 - Data File ProcessorBuild a Python script that processes multiple CSV files and writes a processing summary.100 minarticle2 pages

Project Brief

Explain the project scenario and expected output.

20 min

Review Checklist

Checklist for project quality.

20 min

2Phase 2 - Working with Files and Data FormatsRead, write, discover, parse, combine, and export files across text, CSV, JSON, Excel, logs, and DataFrames.3 modules13 lessons2 weeks
Module 1: File Systems and Data IngestionUse file paths, directories, batch processing, file metadata, and safe file movement.4 lessons
Lesson 1: File Paths and DirectoriesUse absolute paths, relative paths, pathlib, folders, file naming, and file discovery.55 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

27 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

20 min

Lesson 2: Reading and Writing Text FilesUse open, read, write, append, encoding, and newline handling.55 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

27 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

20 min

Lesson 3: Batch File ProcessingProcess folders, file loops, input/output folders, processed/archive folders, and avoid accidental overwrite.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: File MetadataCapture file size, created/modified time, extension, source system, batch date, and load time.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Module 2: Structured Data FormatsWork with CSV, JSON, Excel, and log/semi-structured formats.4 lessons
Lesson 1: CSV FilesUnderstand CSV structure, delimiters, headers, missing values, malformed rows, and encoding issues.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: JSON DataWork with JSON objects, arrays, nested JSON, API-style JSON, and flattening concepts.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: Excel FilesProcess multiple sheets, sheet names, inconsistent headers, and writing Excel outputs.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: Logs and Semi-Structured DataParse server/application logs, timestamps, patterns, and extract structured fields.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Module 3: DataFrames for Data EngineeringUse Pandas carefully for loading, schema inspection, combining files, and exporting outputs.5 lessons
Lesson 1: Pandas for Data EngineeringUse DataFrames for loading data, inspecting schema, data types, and memory awareness.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: Schema AwarenessDetect expected columns, unexpected columns, missing columns, column ordering, data types, and schema drift.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: Combining FilesConcatenate files, append daily batches, track source, use batch IDs, and manage duplicate risk.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: Exporting DataExport CSV, JSON, Excel, partitioned outputs, and batch-date filenames.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 5: Milestone Project 1 - Multi-Format Ingestion PipelineBuild an ingestion pipeline for CSV, JSON, and Excel files with schema validation and reporting.120 minarticle2 pages

Project Brief

Explain the project scenario and expected output.

20 min

Review Checklist

Checklist for project quality.

20 min

3Phase 3 - Data Transformation and ValidationClean, transform, validate, quarantine, and report on data quality in Python pipelines.3 modules14 lessons2 weeks
Module 1: Data Cleaning for Engineering WorkflowsClean common business data problems in text, dates, numeric fields, and identifiers.4 lessons
Lesson 1: Common Data Quality ProblemsIdentify missing values, duplicates, inconsistent categories, invalid dates, invalid numeric values, broken IDs, and out-of-range values.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: Cleaning Text FieldsTrim whitespace, standardize casing, categories, special characters, names, and null-like strings.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: Date and Time HandlingParse dates, handle timezone basics, date formats, invalid dates, year/month/day extraction, and batch dates.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Lesson 4: Numeric TransformationClean currency fields, percentages, negative values, rounding, invalid numeric strings, and type conversion.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Module 2: Data Transformation PatternsApply mapping, business rules, merges, aggregations, and incremental processing concepts.5 lessons
Lesson 1: Mapping and StandardizationUse lookup maps, category mapping, code mapping, region mapping, and product mapping.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: Filtering and Business RulesApply active records, valid transactions, excluded statuses, date windows, and business filters.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 3: Joins and Merges in PythonMerge DataFrames, manage join keys, one-to-many issues, missing matches, and duplicate keys.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Lesson 4: AggregationsUse groupby counts, sums, averages, min/max, grouped outputs, and summary tables.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 5: Incremental Processing ConceptsUnderstand full load, incremental load, batch date, new records, changed records, and late-arriving data.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Module 3: Data Validation and Quality ChecksBuild validation rules, quality reports, rejection outputs, and framework-ready checklists.5 lessons
Lesson 1: Validation RulesCheck required fields, unique keys, accepted values, date ranges, numeric ranges, and foreign keys.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 2: Data Quality ReportsCreate row counts, null counts, duplicate counts, invalid records, warnings, and failure thresholds.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: Quarantine Bad RecordsSeparate valid records, rejected records, rejection reasons, error files, and auditability.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: Great Expectations and Validation Framework ConceptsUnderstand expectations, validation suites, automated checks, data contracts, and when frameworks help.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 5: Milestone Project 2 - Data Cleaning and Quality PipelineBuild a pipeline that cleans, validates, rejects bad records, reports quality, and writes clean outputs.120 minarticle2 pages

Project Brief

Explain the project scenario and expected output.

20 min

Review Checklist

Checklist for project quality.

20 min

4Phase 4 - Databases and SQL with PythonConnect Python to relational databases, read/write SQL data, load staged data, and reconcile pipeline results.3 modules13 lessons1–2 weeks
Module 1: Database Integration FoundationsUnderstand database use cases and connect Python safely to databases.4 lessons
Lesson 1: Why Data Engineers Use DatabasesCompare files, transactional databases, analytical databases, staging tables, raw/clean layers, and loading pipelines.55 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

27 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

20 min

Lesson 2: Connecting Python to DatabasesUse connection strings, credentials, environment variables, connection safety, drivers, and SQLAlchemy concepts.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Lesson 3: Reading Data from SQLUse SELECT queries, read into DataFrames, query parameters, limits, and avoid full-table accidents.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: Writing Data to SQLUse insert, append, replace, staging tables, bulk-load concepts, and data type mapping.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Module 2: Database Loading PatternsUse staging tables, upserts, audit columns, and safe write patterns.4 lessons
Lesson 1: Staging TablesUse staging tables for raw loads, validation after load, temporary tables, and audit columns.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: Upserts and DeduplicationUnderstand insert vs update, natural keys, surrogate keys, duplicate handling, and conflict resolution.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Lesson 3: Audit Columns and Load TrackingAdd batch ID, source file, loaded_at, processed_at, record status, and error reason.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 4: Transactions and Safe WritesUse commits, rollbacks, partial failures, idempotency concepts, and safe reruns.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Module 3: Database Validation and ReconciliationValidate loads using row counts, key checks, freshness checks, and run summary tables.5 lessons
Lesson 1: Row Count ReconciliationCompare source count, loaded count, rejected count, and mismatch detection.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: Duplicate and Key ChecksValidate primary keys, unique keys, duplicate records, and referential checks.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 3: Data Freshness ChecksCheck latest load date, missing batch, stale data, and source delays.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 4: Load Summary TablesStore pipeline run logs, status, record counts, duration, and errors.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 5: Milestone Project 3 - File-to-Database Loading PipelineBuild a pipeline that validates daily files, loads clean data into database tables, stores rejected records, and records load metadata.130 minarticle2 pages

Project Brief

Explain the project scenario and expected output.

20 min

Review Checklist

Checklist for project quality.

20 min

5Phase 5 - Building Batch Data PipelinesDesign batch pipelines with extraction, transformation, loading, configuration, idempotency, logging, and run reports.3 modules13 lessons2 weeks
Module 1: Pipeline Design FundamentalsUnderstand pipeline flows, batch vs streaming, layers, and safe reruns.4 lessons
Lesson 1: What Is a Data Pipeline?Understand source, extraction, transformation, loading, validation, monitoring, and downstream consumers.55 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

27 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

20 min

Lesson 2: Batch vs StreamingCompare batch pipelines, streaming pipelines, scheduled jobs, real-time needs, and when batch is enough.55 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

27 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

20 min

Lesson 3: Pipeline LayersDesign raw, staging, cleaned, curated, reporting, and audit layers.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 4: Idempotency and RerunsBuild rerunnable pipelines, duplicate prevention, deterministic outputs, batch IDs, overwrite vs append logic.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Module 2: Extraction PatternsExtract data from files, APIs, and databases with pagination, high-watermarks, and logs.4 lessons
Lesson 1: File ExtractionHandle file drops, naming conventions, batch folders, archive folders, missing files, and validation.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: API ExtractionHandle API pagination, date filters, incremental extraction, authentication, rate limits, and retries.75 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

37 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

30 min

Lesson 3: Database ExtractionUse SQL extraction, incremental queries, updated_at fields, high-watermark concept, and performance basics.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Lesson 4: Extraction LoggingLog source, start/end time, records extracted, errors, retry count, and next cursor/high-watermark.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Module 3: Transformation and Loading PatternsBuild reusable transformations, choose load strategies, use configuration, and generate run reports.5 lessons
Lesson 1: Transformation FunctionsRefactor clean, map, standardize, validate, and testable transformations.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 2: Load StrategiesChoose append, overwrite, merge/upsert, partitioned loads, staging-to-final, and failure recovery.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: Pipeline ConfigurationUse config files, environment variables, source configs, table configs, schedule configs, and reusable pipelines.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Lesson 4: Pipeline ReportsGenerate run summary, data quality summary, load summary, error summary, and stakeholder notification.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 5: Milestone Project 4 - End-to-End Batch PipelineBuild a batch pipeline that extracts, transforms, validates, loads, logs, supports reruns, and reports pipeline results.140 minarticle2 pages

Project Brief

Explain the project scenario and expected output.

20 min

Review Checklist

Checklist for project quality.

20 min

6Phase 6 - Reliability, Logging and Project StructureImprove pipeline reliability through logging, observability, alerts, debugging, testing, configuration, documentation, and collaboration.3 modules13 lessons1–2 weeks
Module 1: Logging, Monitoring and AlertsAdd operational visibility to pipelines.4 lessons
Lesson 1: Logging FundamentalsReplace print statements with logging, log levels, log files, structured logs, and useful messages.55 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

27 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

20 min

Lesson 2: Pipeline ObservabilityTrack run status, row counts, duration, error counts, quality failures, and freshness.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: Alerts and NotificationsDesign failure, warning, missing file, quality, and notification channel alerts.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 4: Debugging PipelinesRead logs, trace failures, isolate bad data, reproduce errors, fix and rerun.70 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

35 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

27 min

Module 2: Testing Data Engineering CodeTest functions, pipeline components, data quality checks, and regression bugs.4 lessons
Lesson 1: Testing Python FunctionsWrite unit tests, test cases, expected outputs, edge cases, and pytest basics.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: Testing Pipeline ComponentsTest extract, transform, load, fake inputs, and sample outputs independently.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: Data Quality TestsTest schema, nulls, uniqueness, accepted values, and row counts.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: Regression Testing for PipelinesPrevent old bugs using test datasets, expected files, rerun checks, and safe refactoring.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Module 3: Professional Project StructureStructure repositories, manage config/secrets, document pipelines, and collaborate with Git.5 lessons
Lesson 1: Data Engineering Repository StructureOrganize src, configs, raw/processed data, tests, logs, scripts, and docs.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 2: Configuration and SecretsUse config files, .env, credentials, .gitignore, secret safety, and environment-specific configs.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 3: DocumentationWrite README, pipeline overview, setup, source documentation, data dictionary, and runbook.65 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

32 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

25 min

Lesson 4: Git and CollaborationUse commits, branches, pull requests, code reviews, reproducible work, and documenting changes.60 minarticle3 pages

Overview and Learning Objectives

Introduce the lesson and clarify expected outcomes.

8 min

Concepts and Professional Workflow

Explain the concept through a realistic data engineering workflow.

30 min

Practice Activity

Apply the lesson through a guided data engineering exercise.

22 min

Lesson 5: Mini Project 2 - Pipeline Reliability UpgradeImprove a rough pipeline by adding logging, configuration, tests, documentation, error handling, and structure.110 minarticle2 pages

Project Brief

Explain the project scenario and expected output.

20 min

Review Checklist

Checklist for project quality.

20 min

7Phase 7 - CapstoneComplete a production-aware Python data engineering capstone pipeline.1 modules3 lessons1 week
Module 1: Python Data Engineering CapstoneBuild a production-aware Python data pipeline that collects, cleans, validates, loads, logs, tests, and documents data.3 lessons
Lesson 1: Capstone OptionsChoose a realistic data engineering capstone option.55 minarticle1 pages

Choose Your Python Data Engineering Capstone

Review approved capstone options.

55 min

Lesson 2: Final Capstone - Python Data Engineering CapstoneBuild a production-aware Python data pipeline that collects, cleans, validates, loads, logs, tests, and documents data from one or more sources.180 minarticle2 pages

Project Brief

Explain the project scenario and expected output.

20 min

Review Checklist

Checklist for project quality.

20 min

Lesson 3: Graduation Requirements and Portfolio OutcomeClarify completion requirements, portfolio outcomes, path position, and why the course matters.55 minarticle1 pages

Requirements and Portfolio Checklist

Summarize graduation requirements and portfolio assets.

55 min

Tools and skills

Build skill with the tools used in the work.

Write Python scripts for data engineering tasks.Read, write, and process CSV, JSON, Excel, and structured data files.Extract data from APIs and external sources.Clean, transform, and validate data with Python.Connect Python scripts to databases.Handle errors, logs, and failed data processes more professionally.Build reusable data processing functions.Automate repetitive data movement and preparation tasks.Prepare data for pipelines, warehouses, and analytics systems.Build portfolio-ready Python data engineering projects.

Projects and exercises

  • Write Python scripts for data engineering tasks.
  • Structured exercises
  • Portfolio practice

Resources included

  • Course resources
  • Project guidance
Who this is for
  • Learners building practical tech skills
Prerequisites
  • A willingness to practice consistently

Career relevance

Python for Data Engineering supports practical career readiness.

Related Professional Diploma

Data Engineering

Learn how to build the pipelines, data models, warehouses, orchestration workflows, and cloud data systems that power analytics, reporting, machine learning, and AI products.

View Professional Diploma
FAQ

Questions about this Short Course.

Short Course answers about scope, projects, support, and next steps.

Yes. Python for Data Analytics focuses on analysis, exploration, and insight generation. Python for Data Engineering focuses on data movement, automation, validation, pipelines, files, APIs, and database workflows.
Related Short Courses

Continue building connected skills.

View all Short Courses
School of Data & AIData AnalyticsBeginner to Intermediate

SQL for Data Analytics

Query databases, join tables, summarize records, and uncover business insights with SQL.

Learn the SQL skills data analysts use to extract, filter, join, group, and analyze data from relational databases.

From₦65,000
7 weeks - 6-8 hours/week
Understand tables, columns, rows, keys, and relationships.
Project included
Mentor review available

Related Professional Diploma

Data Engineering

View Short Course
School of Data & AIData & AIBeginner to Intermediate

Excel for Data Analytics

Turn raw spreadsheets into clean analysis, useful reports, and business-ready insights.

Master the Excel skills used by data analysts to clean, organize, calculate, summarize, visualize, and report business data with confidence.

From₦50,000
6 weeks - 5–8 hours /week
Clean and organize messy spreadsheet data.
Project included
Mentor review available
View Short Course
School of Data & AIData AnalyticsIntermediate

Power BI for Business Intelligence

Build interactive dashboards and business reports that make performance clear.

Learn to connect, clean, model, measure, visualize, and present business data using Power BI.

From₦85,000
8 weeks - 6-8 hours/week
Connect Power BI to different data sources.
Project included
Mentor review available
View Short Course
Professional Diploma application

Continue through Data Engineering.

This course is included in a Professional Diploma, so tuition enrollment is handled after the diploma application flow.