Standards in this Framework
Standards Mapped
Mapped to Course
Standard | Lessons |
---|---|
1.1.1
Identify the key stages of a data science project lifecycle. |
|
1.1.2
Identify key roles and their responsibilities in a data science team (e.g., business stakeholders, define objectives; data engineers, build pipelines; data scientist, develop models; domain experts, provide expertise). |
|
1.1.3
Define and create project goals and deliverables (e.g., problem statements, success metrics, expected outcomes, final reports, summary presentations). |
|
1.1.4
Create and manage project timelines (e.g., milestones, deadlines, task dependencies, resources allocation). |
|
1.1.5
Create a student portfolio including completed data science projects, reports, and other student-driven accomplishments. |
|
1.2.1
Collaborate in team-based projects (e.g., team discussions, maintaining project logs, following protocols, code review, documentation). |
|
1.2.2
Communicate technical findings to non-technical audiences (e.g., data visualizations, present key-insights, explaining complex concepts). |
|
1.2.3
Make data-driven decisions and recommendations by proposing solutions and evaluating alternatives. |
|
1.3.1
Identify ethical considerations in data collection, storage and usage (e.g., data privacy, bias, transparency, consent). |
|
1.3.2
Demonstrate responsible data handling practices (e.g., protecting sensitive information, citing data sources, maintaining data integrity). |
|
1.3.3
Report results responsibly (e.g., addressing limitations, acknowledging uncertainties, prevent misinterpretation). |
|
2.1.1
Differentiate between discrete and continuous probability distributions. |
|
2.1.2
Calculate probabilities using discrete distributions (e.g. Uniform, Binomial, Poisson). |
|
2.1.3
Calculate probabilities using continuous distributions (e.g. Uniform, Normal, Student, Exponential). |
|
2.1.4
Apply Bayes’ Theorem to calculate posterior probabilities. |
|
2.2.1
Calculate p-values using a programming library and interpret the significance of the results. |
|
2.2.2
Perform hypothesis testing. |
|
2.2.3
Identify and explain Type I and Type II Errors (e.g., false-positives, false-negatives). |
|
2.2.4
Calculate and interpret confidence intervals. |
|
2.2.5
Design and analyze experiments to compare outcomes (e.g., identifying control/treatment groups, selecting sample sizes, determining variables, implementing A/B tests). |
|
2.3.1
Perform basic matrix operations including addition, subtraction and scalar multiplication. |
|
2.3.2
Calculate dot products and interpret their geometric meaning. |
|
2.3.3
Apply matrix transformations to data sets. |
|
2.3.4
Compute and interpret distances between vectors. |
|
3.1.1
Create and manipulate (e.g., sort, filter, aggregate, reshape, merge, extract, clean, transform, subset) one-dimensional data structures for computation analysis (e.g lists, arrays, series). |
|
3.1.2
Create and manipulate (e.g., transpose, join, slice, pivot, reshape) two-dimensional data structures for organizing structured datasets. (e.g. matrices, dataframes). |
|
3.1.3
Utilize operations (e.g., arithmetic, aggregations, transformations) across data structures based on analytical needs. |
|
3.1.4
Apply indexing methods to select and filter data based on position, labels, and conditions. |
|
3.2.1
Import data into a DataFrame from common spreadsheets formats (e.g., csv, xlsx). |
|
3.2.2
Import data into a DataFrame directly from a database (e.g., using SQLalchemy library). |
|
3.2.3
Import data into a DataFrame using web scraping libraries (e.g. Beautiful Soup, Selenium). |
|
3.2.4
Import data into a DataFrame leveraging API requests (e.g., Requests, urllib). |
|
3.3.1
Convert between data types as needed for analysis (e.g., strings to numeric values, dates to timestamps, categorical to numeric encoding). |
|
3.3.2
Convert between structures as needed for analysis (e.g., lists to arrays, arrays to data frames). |
|
3.3.3
Standardize and clean text data (e.g., remove whitespace, correct typos, standardize formats). |
|
3.3.4
Identify and remove duplicate or irrelevant rows/records. |
|
3.3.5
Restructure columns/fields for analysis (e.g., splitting, combining, renaming, removing irrelevant data). |
|
3.3.6
Apply masking operations to filter and select data. |
|
3.3.7
Handle missing and invalid data values using appropriate methods (e.g., removal, imputation, interpolation). |
|
3.3.8
Identify and handle outliers using statistical methods. |
|
3.4.1
Examine data structures using preview and summary methods (e.g., head, info, shape, describe). |
|
3.4.2
Create new data frames by merging or joining two data frames. |
|
3.4.3
Sort and group records based on conditions and/or attributes. |
|
3.4.4
Create functions to synthesize features from existing variables (e.g., mathematical operations, scaling, normalization). |
|
4.1.1
Generate histograms and density plots to display data distributions. |
|
4.1.2
Create box plots and violin plots to show data spread and quartiles. |
|
4.1.3
Construct Q-Q plots to assess data normality. |
|
4.2.1
Generate scatter plots and pair plots to show relationships between variables. |
|
4.2.2
Generate correlation heatmaps to display feature relationships. |
|
4.2.3
Plot decision boundaries to visualize data separations. |
|
4.3.1
Generate bar charts and line plots to compare categorical data. |
|
4.3.2
Create heat maps to display confusion matrices and tabular comparisons. |
|
4.3.3
Plot ROC curves and precision-recall curves to evaluate classifications. |
|
4.4.1
Generate line plots to show trends over time. |
|
4.4.2
Create residual plots to analyze prediction errors. |
|
4.4.3
Plot moving averages and trend lines. |
|
4.5.1
Draw conclusions by interpreting statistical measures (e.g., p-values, confidence intervals, hypothesis test results). |
|
4.5.2
Evaluate model performance using appropriate metrics and visualizations (e.g., R-squared, confusion matrix, residual plots). |
|
4.5.3
Identify patterns, trends, and relationships in data visualizations (e.g., correlation strength, outliers, clusters). |
|
4.5.4
Draw actionable insights from analysis results. |
|
5.1.1
Describe the key characteristics of Big Data (e.g., Volume, Velocity, Variety, Veracity). |
|
5.1.2
Identify real-world applications of Big Data across industries (e.g., healthcare, finance, retail, social media). |
|
5.1.3
Analyze case studies of successful and unsuccessful Big Data implementations across industries (e.g., recommendation systems, fraud detection, predictive maintenance). |
|
5.1.4
Identify common Big Data platforms and tools (e.g., Hadoop for distributed storage, Spark for data processing, Tableau for visualization, MongoDB for unstructured data). |
|
5.2.1
Describe how organizations store structured and unstructured data. |
|
5.2.2
Compare different types of data storage systems (e.g., data warehouse, data lakes, databases). |
|
6.1.1
Contrast supervised and unsupervised learning. |
|
6.1.2
Differentiate between classification and regression problems. |
|
6.1.3
Evaluate model performance using appropriate metrics (e.g. Accuracy, Precision/Recall, Mean Squared Error, R-squared). |
|
6.2.1
Perform linear regression for prediction problems. |
|
6.2.2
Perform multiple regression for prediction problems. |
|
6.2.3
Perform logistic regression for classification tasks. |
|
6.2.4
Implement Naive Bayes Classification using probability concepts. |
|
6.2.5
Perform k-means clustering using distance metrics. |
|
6.3.1
Apply standard methods to split data into training and testing sets. |
|
6.3.2
Apply cross-validation techniques (e.g. k-fold, leave-one-out, stratified k-fold). |
|
6.3.3
Identify and address overfitting/underfitting. |
|
6.3.4
Select appropriate models based on data characteristics and problem requirements. |
|