Project Autodidact
Project Details: https://insightsbyse.com/projectautodidact/
Scott Ernst Bio: https://insightsbyse.com/aboutscotternst/
Project Contact: InsightsBySE@protonmail.com
Progress Report Scope (S01-M02-D01-AllParts)
Stage 1 of 4: Review of Mathematics, Probability, and Statistics
Module 2 of 3: Statistics, Probability, and Advanced Algebra
Day 1 of 5: Descriptive Statistics
Parts 1 through 10: See below
Summary Of Goals Achieved
- Reviewed procedures for calculating measure of central tendency of a dataset (mean, median, and mode)
- Reviewed procedures for calculating measures of dispersion in a dataset (range, interquartile range (IQR), variance, standard deviation, mean absolute deviation (MAD), and coefficient of variation (CV))
- Reviewed procedures for calculating measures of distribution in a dataset (percentiles and quartiles)
- Reviewed procedures for calculating measures of distribution shape in a dataset (skewness and kurtosis)
- Reviewed procedures for constructing and interpreting histograms to visualize and analyze the frequency distribution of numerical data
- Reviewed procedures for constructing and interpreting box plots to summarize and analyze the distribution of numerical data, enabling outlier detection and comparative analysis
- Reviewed procedures for calculating and interpreting the five-number summary (minimum, first quartile, median, third quartile, and maximum) in a dataset
- Reviewed procedures for calculating and interpreting the interquartile range (IQR) in a dataset. NOTE: Kernel density estimation (KDE), cumulative distribution functions (CDFs), and Q-Q plots will be subsequently studied.
- Reviewed procedures to apply descriptive statistics to datasets containing data about military veterans
- Reviewed procedures for contextual summarization to interpret and present statistical results within the specific context of the dataset, ensuring findings are meaningful to stakeholders
- Reviewed procedures for data cleaning to ensure datasets are accurate, complete, and consistent before analysis
- Reviewed methods and procedures for normalizing data (min-max normalization, modified min-max normalization, decimal scaling, logarithmic transformation, and L2 normalization) and standardizing data (z-score standardization and robust scaling) in a dataset
- Reviewed methods and procedures for scaling data
- Reviewed definitions, notation, terminology, components, and principles for interpreting measures in data visualization
- Review best practices for (1) visual encoding, (2) aesthetics, and (3) clarity in data visualization
- Reviewed methods for solving problems with missing data in a dataset (listwise deletion (aka complete case analysis), pairwise deletion (aka available case analysis), mean imputation, median imputation, mode imputation, regression imputation, multiple imputation, k-nearest neighbors imputation, hot-deck imputation, expectation-maximization imputation, and imputation based on machine learning algorithms (e.g., random forests or neural networks)). NOTE: These methods will be subsequently studied in more detail.
- Reviewed methods for statistical evaluation of dataset quality (descriptive statistics, completeness checks, consistency checks, outlier detection, correlation analysis, distributional checks, and duplicate detection). NOTE: These methods will be subsequently studied in more detail.
- Reviewed methods for detecting and evaluating dataset outliers (z-score, modified z-score interquartile range, Mahalanobis distance, isolation forest, DBSCAN, and local outlier factor). NOTE: These methods will be subsequently studied in more detail.
Part 1 of 10
Goal 1 Statement: Review procedures for calculating central tendency of a dataset (mean, median, and mode)
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Part 2 of 10
Goal 1 Statement: Review procedures for calculating measures of dispersion in a dataset (range, interquartile range (IQR), variance, standard deviation, mean absolute deviation (MAD), and coefficient of variation (CV))
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Part 3 of 10
Goal 1 Statement: Review procedures for calculating measures of distribution in a dataset (percentiles and quartiles)
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Part 4 of 10
Goal 1 Statement: Review procedures for calculating measures of distribution shape in a dataset (skewness and kurtosis)
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Part 5 of 10
Goal 1 Statement: Review procedures for constructing and interpreting histograms to visualize and analyze the frequency distribution of numerical data
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Part 6 of 10
Goal 1 Statement: Review procedures for constructing and interpreting box plots to summarize and analyze the distribution of numerical data, enabling outlier detection and comparative analysis
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Goal 2 Statement: Review procedures for calculating and interpreting the five-number summary (minimum, first quartile, median, third quartile, and maximum) in a dataset
Goal 2 Plan: Read source materials
Goal 2 Work Product: None
Goal 2 Result: Completed
Goal 3 Statement: Review procedures for calculating and interpreting the interquartile range (IQR) in a dataset
Goal 3 Plan: Read source materials
Goal 3 Work Product: None
Goal 3 Result: Completed
Part 7 of 10
Goal 1 Statement: Review procedures to apply descriptive statistics to datasets containing data about military veterans
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Goal 2 Statement: Review procedures for contextual summarization to interpret and present statistical results within the specific context of the dataset, ensuring findings are meaningful to stakeholders
Goal 2 Plan: Read source materials
Goal 2 Work Product: None
Goal 2 Result: Completed
Goal 3 Statement: Review procedures for data cleaning to ensure datasets are accurate, complete, and consistent before analysis
Goal 3 Plan: Read source materials
Goal 3 Work Product: None
Goal 3 Result: Completed
Part 8 of 10
Goal 1 Statement: Review methods and procedures for normalizing data (min-max normalization, modified min-max normalization, decimal scaling, logarithmic transformation, and L2 normalization) and standardizing data (z-score standardization and robust scaling) in a dataset
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Goal 2 Statement: Review methods and procedures for scaling data
Goal 2 Plan: Read source materials
Goal 2 Work Product: None
Goal 2 Result: Completed
Part 9 of 10
Goal 1 Statement: Review definitions, notation, terminology, components, and principles for interpreting measures in data visualization
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Goal 2 Statement: Review best practices for (1) visual encoding, (2) aesthetics, and (3) clarity in data visualization
Goal 2 Plan: Read source materials
Goal 2 Work Product: None
Goal 2 Result: Completed
Part 10 of 10
Goal 1 Statement: Review methods for solving problems with missing data in a dataset (listwise deletion (aka complete case analysis), pairwise deletion (aka available case analysis), mean imputation, median imputation, mode imputation, regression imputation, multiple imputation, k-nearest neighbors imputation, hot-deck imputation, expectation-maximization imputation, and imputation based on machine learning algorithms (e.g., random forests or neural networks)). NOTE: These methods will be subsequently studied in more detail.
Goal 1 Plan: Read source materials
Goal 1 Work Product: None
Goal 1 Result: Completed
Goal 2 Statement: Review methods for statistical evaluation of dataset quality (descriptive statistics, completeness checks, consistency checks, outlier detection, correlation analysis, distributional checks, and duplicate detection). NOTE: These methods will be subsequently studied in more detail.
Goal 2 Plan: Read source materials
Goal 2 Work Product: None
Goal 2 Result: Completed
Goal 3 Statement: Review methods for detecting and evaluating dataset outliers (z-score, modified z-score interquartile range, Mahalanobis distance, isolation forest, DBSCAN, and local outlier factor). NOTE: These methods will be subsequently studied in more detail.
Goal 3 Plan: Read source materials
Goal 3 Work Product: None
Goal 3 Result: Completed