The course consists of three modules of two sessions each. Namely,
- Data preparation (sessions 1 & 2),
- Data visualization & interactive graphics (sessions 3 & 4),
- Dashboards & structured data (sessions 5 & 6).
Session 1 (22 September): Data preparation
The basics of data preparation with Python will be covered:
- Common operations & libraries
- Challenges in batch data preprocessing
- Missing or incomplete records
- Outliers or anomalies
- Improperly formatted / structured data
- Inconsistent values and non-standardized categorical variables
- Limited or sparse features / attributes
- Functions of Pandas (or Pyspark)
- Visualisation of data
- Best practices for data cleaning & software engineering
- Readable & documented code
- Logging
- Handling null values
- Data consistency, standardization & documentation
Session 2 (29 September): Advanced data preparation theory
In this session, you will cover more advanced topics.
- Best Practices for Feature Engineering
- Unstructured data
- Common operations
- Data Observability & data quality
- Data profiling
- Unit testing
- Moving to production
- Docker
- Workflow orchestrators (e.g., Dagster, Airflow)
Session 3 (13 October) and session 4 (20 October): Data visualization, tools & libraries
In these workshop sessions, you will get an introduction to data visualization for data science, analysis, and storytelling. You will get hands-on experience working with popular Python libraries and standard tools on multiple datasets. The module is divided into the following five segments:
- Introduction to fundamentals of Data Visualization
- Introduction to Data Visualization using Matplotlib, Pandas, and NumPy in Python
- Advanced Data Visualization using Seaborn in Python
- Interactive Data Visualization using Plotly in Python
- Introduction to Data Reporting using Tableau
As a key takeaway from these sessions, along with the theoretical knowledge of data visualization, you will develop hands-on skills working with practical datasets. The code and the datasets will be provided to you and we will use Google Colab to practice the exercises from the first three segments. For the last segment, we recommend you to install Tableau Public.
Session 5 (27 October): Dashboards & linked visualizations
This session gives a hands-on introduction to dashboards and linked graphics, using Plotly Dash:
- Page layout, dash components, custom components,
- Recap interactive plots,
- Input, output, callbacks,
- Make components respond to each other.
- Examples,
- Integration of server-side computations and algorithms,
- Deployment
Session 6 (10 November): Interactive graphics to explore complex data
This session covers creating visualizations of structured and complex data, and the integration of these into dashboards:
- Visualization of (large) time series and collections of time series,
- (Choropleth) Maps and their complexities, terrain visualization, Sankey diagram/flow map,
- Visualization using constructed axes: dimensionality reduction/representation learning, visualization of graphs/networks (layout algorithms, chord diagram), image databases.
The lecturers
Quinten Rosseel (session 1 & 2) is the technical founder at dotdash.ai (a knowledge graph startup), and has data science & engineering experience across various organizations (Bingli, Unilin, Volvo Group, Tomorrowland, Atlas Copco).
Katrien Verbert (session 3 & 4) is professor at the Augment research group of KU Leuven. She obtained a doctoral degree in Computer Science in 2008 at KU Leuven, Belgium. She was a postdoctoral researcher of the Research Foundation – Flanders (FWO) at KU Leuven. She was an Assistant Professor at TU Eindhoven, the Netherlands (2013 –2014) and Vrije Universiteit Brussel, Belgium (2014 – 2015). Her research interests include visualization techniques, recommender systems, explainable AI, and visual analytics. She has been involved in several European and Flemish projects on these topics, including the EU ROLE, STELLAR, STELA, ABLE, LALA, PERSFO, Smart Tags and BigDataGrapes projects.
Aditya Bhattacharya (session 3 & 4) is a doctoral researcher on Explainable AI at the Augment research group of the Department of Computer Sciences, KU Leuven. He has obtained his Master of Science degree in Computer Science with a specialization in Machine Learning from Georgia Institute of Technology, USA. Before joining KU Leuven, Aditya has worked in multiple roles in organizations like Microsoft and Intel and has worked on multiple AI projects in domains related to Computer Vision, Natural Language Processing, Time Series Analysis, Classical Machine Learning, and Data Engineering.
Jefrey Lijffijt (session 5 & 6) is Professor of Data Science, Knowledge Discovery, and Visual Analytics at Ghent University. He has a background in algorithms, statistics, machine learning, and data visualization. He likes making tools that help others to better understand and utilize data.