Deployed data ingestion pipeline that preprocesses and ingests data from a data lake into a data warehouse.
ETL Data Pipelining • Preprocessing • Docker • Data Ingestion • Data Warehouse • Scalability • Postgres • SQL • Data Lake
Real Data: Using real open city data.
Scalability: Pipeline deployed in Docker which contributes to scalability and reproducibility.
Preprocessing: Preprocessing the data so it conforms to the data warehouse schema. This includes cleaning, dropping columns, etc. (Data cleansing/sanitation)
Data Management Efficiency: Deployed PgAdmin instance to manage ingested data.