Project Autodidact Progress Report (S01-M02-D01-AllParts)

Project Autodidact

Project Details: https://insightsbyse.com/projectautodidact/

Scott Ernst Bio: https://insightsbyse.com/aboutscotternst/

Project Contact: InsightsBySE@protonmail.com

Progress Report Scope (S01-M02-D01-AllParts)

Stage 1 of 4: Review of Mathematics, Probability, and Statistics

Module 2 of 3: Statistics, Probability, and Advanced Algebra

Day 1 of 5: Descriptive Statistics

Parts 1 through 10: See below

Summary Of Goals Achieved

  1. Reviewed procedures for calculating measure of central tendency of a dataset (mean, median, and mode)
  2. Reviewed procedures for calculating measures of dispersion in a dataset (range, interquartile range (IQR), variance, standard deviation, mean absolute deviation (MAD), and coefficient of variation (CV))
  3. Reviewed procedures for calculating measures of distribution in a dataset (percentiles and quartiles)
  4. Reviewed procedures for calculating measures of distribution shape in a dataset (skewness and kurtosis)
  5. Reviewed procedures for constructing and interpreting histograms to visualize and analyze the frequency distribution of numerical data
  6. Reviewed procedures for constructing and interpreting box plots to summarize and analyze the distribution of numerical data, enabling outlier detection and comparative analysis
  7. Reviewed procedures for calculating and interpreting the five-number summary (minimum, first quartile, median, third quartile, and maximum) in a dataset
  8. Reviewed procedures for calculating and interpreting the interquartile range (IQR) in a dataset.  NOTE: Kernel density estimation (KDE), cumulative distribution functions (CDFs), and Q-Q plots will be subsequently studied.
  9. Reviewed procedures to apply descriptive statistics to datasets containing data about military veterans
  10. Reviewed procedures for contextual summarization to interpret and present statistical results within the specific context of the dataset, ensuring findings are meaningful to stakeholders
  11. Reviewed procedures for data cleaning to ensure datasets are accurate, complete, and consistent before analysis
  12. Reviewed methods and procedures for normalizing data (min-max normalization, modified min-max normalization, decimal scaling, logarithmic transformation, and L2 normalization) and standardizing data (z-score standardization and robust scaling) in a dataset
  13. Reviewed methods and procedures for scaling data
  14. Reviewed definitions, notation, terminology, components, and principles for interpreting measures in data visualization
  15. Review best practices for (1) visual encoding, (2) aesthetics, and (3) clarity in data visualization
  16. Reviewed methods for solving problems with missing data in a dataset (listwise deletion (aka complete case analysis), pairwise deletion (aka available case analysis), mean imputation, median imputation, mode imputation, regression imputation, multiple imputation, k-nearest neighbors imputation, hot-deck imputation, expectation-maximization imputation, and imputation based on machine learning algorithms (e.g., random forests or neural networks)).  NOTE: These methods will be subsequently studied in more detail.
  17. Reviewed methods for statistical evaluation of dataset quality (descriptive statistics, completeness checks, consistency checks, outlier detection, correlation analysis, distributional checks, and duplicate detection).  NOTE: These methods will be subsequently studied in more detail.
  18. Reviewed methods for detecting and evaluating dataset outliers (z-score, modified z-score interquartile range, Mahalanobis distance, isolation forest, DBSCAN, and local outlier factor).  NOTE: These methods will be subsequently studied in more detail.

Part 1 of 10

Goal 1 Statement: Review procedures for calculating central tendency of a dataset (mean, median, and mode)

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Part 2 of 10

Goal 1 Statement: Review procedures for calculating measures of dispersion in a dataset (range, interquartile range (IQR), variance, standard deviation, mean absolute deviation (MAD), and coefficient of variation (CV))

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Part 3 of 10

Goal 1 Statement: Review procedures for calculating measures of distribution in a dataset (percentiles and quartiles)

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Part 4 of 10

Goal 1 Statement: Review procedures for calculating measures of distribution shape in a dataset (skewness and kurtosis)

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Part 5 of 10

Goal 1 Statement: Review procedures for constructing and interpreting histograms to visualize and analyze the frequency distribution of numerical data

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Part 6 of 10

Goal 1 Statement: Review procedures for constructing and interpreting box plots to summarize and analyze the distribution of numerical data, enabling outlier detection and comparative analysis

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Goal 2 Statement: Review procedures for calculating and interpreting the five-number summary (minimum, first quartile, median, third quartile, and maximum) in a dataset

Goal 2 Plan: Read source materials

Goal 2 Work Product: None

Goal 2 Result: Completed

Goal 3 Statement: Review procedures for calculating and interpreting the interquartile range (IQR) in a dataset

Goal 3 Plan: Read source materials

Goal 3 Work Product: None

Goal 3 Result: Completed

Part 7 of 10

Goal 1 Statement: Review procedures to apply descriptive statistics to datasets containing data about military veterans

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Goal 2 Statement: Review procedures for contextual summarization to interpret and present statistical results within the specific context of the dataset, ensuring findings are meaningful to stakeholders

Goal 2 Plan: Read source materials

Goal 2 Work Product: None

Goal 2 Result: Completed

Goal 3 Statement: Review procedures for data cleaning to ensure datasets are accurate, complete, and consistent before analysis

Goal 3 Plan: Read source materials

Goal 3 Work Product: None

Goal 3 Result: Completed

Part 8 of 10

Goal 1 Statement: Review methods and procedures for normalizing data (min-max normalization, modified min-max normalization, decimal scaling, logarithmic transformation, and L2 normalization) and standardizing data (z-score standardization and robust scaling) in a dataset

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Goal 2 Statement: Review methods and procedures for scaling data

Goal 2 Plan: Read source materials

Goal 2 Work Product: None

Goal 2 Result: Completed

Part 9 of 10

Goal 1 Statement: Review definitions, notation, terminology, components, and principles for interpreting measures in data visualization

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Goal 2 Statement: Review best practices for (1) visual encoding, (2) aesthetics, and (3) clarity in data visualization

Goal 2 Plan: Read source materials

Goal 2 Work Product: None

Goal 2 Result: Completed

Part 10 of 10

Goal 1 Statement: Review methods for solving problems with missing data in a dataset (listwise deletion (aka complete case analysis), pairwise deletion (aka available case analysis), mean imputation, median imputation, mode imputation, regression imputation, multiple imputation, k-nearest neighbors imputation, hot-deck imputation, expectation-maximization imputation, and imputation based on machine learning algorithms (e.g., random forests or neural networks)).  NOTE: These methods will be subsequently studied in more detail.

Goal 1 Plan: Read source materials

Goal 1 Work Product: None

Goal 1 Result: Completed

Goal 2 Statement: Review methods for statistical evaluation of dataset quality (descriptive statistics, completeness checks, consistency checks, outlier detection, correlation analysis, distributional checks, and duplicate detection).  NOTE: These methods will be subsequently studied in more detail.

Goal 2 Plan: Read source materials

Goal 2 Work Product: None

Goal 2 Result: Completed

Goal 3 Statement: Review methods for detecting and evaluating dataset outliers (z-score, modified z-score interquartile range, Mahalanobis distance, isolation forest, DBSCAN, and local outlier factor).  NOTE: These methods will be subsequently studied in more detail.

Goal 3 Plan: Read source materials

Goal 3 Work Product: None

Goal 3 Result: Completed