Big Data & Data Wrangling
Seminar, Technische Hochschule Mittelhessen, StudiumPlus, 2023
Lecture in the B.Sc. Softwaretechnik (Data Science) program at THM StudiumPlus, 6 SWS / 6 CrP. Running since SoSe 2023, exam-based (90 min written).
The course treats data as the central resource for analysis and machine learning. Topics: Python fundamentals, data formats (CSV, JSON, SQL, HDF5), data import/export, NumPy, Pandas, data quality and cleaning, outlier analysis, visualization with Matplotlib/Seaborn, the DIKW pyramid, data lifecycle, data ethics, Big Data (5 V’s), and MapReduce with Hadoop/PySpark.
All sessions are hands-on in Jupyter Notebooks, working with real datasets (e.g. Titanic, WHO, Chinook).