Fondamenti di Analisi dei Dati e Laboratorio
This page contains information on the Fondamenti di Analisi dei Dati e Laboratorio course, including the syllabus, lecture notes, and other resources.
2025/2026
Changes in 2025/2026
In academic year 2025/2026, the syllabus and course have been completely revised to adapt the load for the transition of the course from the Master’s Degree to the Bachelor’s Degree. The resulting course now focuses on core concepts without delving too much into details covered by other courses or more suited for Master’s studies.
Structure
The course includes a theory module (6 CFU) and a laboratory module (3 CFU). Theory sessions focus on core topics and algorithms, covering intuition and examples. Laboratory sessions focus on solving real data analysis problems using Python through live coding sessions.
Examination
The examination consists of a written exam and a project in Python. Passing two in-itinere tests held during the course allows exemption from the written exam.
Data Science Challenge
At the end of the course, a challenge on data analysis will be organized. Students will solve a real data analysis project in 24/48 hours and present results. The challenge can be tackled by groups of 1–3 students. A sufficient presentation exempts from the project.
Notes
Open-source notes are provided during the course: fadlecturenotes2526. Old notes remain available for reference but are not subject to examination — students refer to the current year’s notes.
Example Exam Questions | Infographic (PDF)
Synthetic Syllabus
- Introduction to Data Analysis: Aims, relevance of data, the data analysis lifecycle, and course structure.
- Exploratory Data Analysis and Descriptive Statistics: Data acquisition, types of data, tabular datasets, measures of central tendency and spread, data visualization, wrangling, and normalization.
- Introduction to Laboratories: Python fundamentals, essential libraries (Numpy, Scipy, Matplotlib, Seaborn, Plotly, Statsmodels, Scikit-Learn), and hands-on data cleaning.
- Probability for Data Analysis and Data Distributions: Uncertainty, random variables, probability estimation, joint and conditional probabilities, expectation, variance, covariance, common probability distributions.
- Data Association: Pearson Chi-square statistic, Cramer V, Covariance, and various correlation coefficients (Pearson, Point-biserial, Spearman, Kendall).
- Statistical Inference: Sampling, standard error, confidence intervals, bias-variance trade-off, hypothesis testing, and normality assessment.
- Linear Regression: Simple and multiple linear regression, estimating coefficients, model accuracy, variable selection, qualitative predictors, and interaction terms.
- Logistic Regression: Relationship between continuous and binary variables, logistic function, model interpretation, and multinomial logistic regression.
- Introduction to Predictive Analysis: Overfitting, empirical risk minimization, generalization, model selection (train/val/test split, cross-validation), and regularization (Ridge, Lasso).
- Classification Problems: Classification vs. regression, evaluation measures, logistic regression as a discriminative classifier, softmax regression, and handling data imbalance.
- Data Representation and Clustering: Feature extractors, supervised vs. unsupervised learning, and K-Means clustering.
- Density Estimation: Parametric vs. non-parametric density estimation, Kernel Density Estimation, Parzen Window, and Maximum Likelihood.
- PCA: Definition of PCA and applications for unsupervised dimensionality reduction.
- Supervised Dimensionality Reduction: Fisher Linear Discriminant and Linear Discriminant Analysis (LDA).
