Redesign: Data Science Intro in R - Data Manipulation + Live Coding

Example of a redesign for a lesson script with live coding exercises

Author

Ilya Musabirov

Published

November 1, 2022

Note

This learning design example demonstrates redesign of a small chunk of a traditional lab with live coding which we improve from a 4C/ID learning design framework and active learning standpoints.

It is based on a module of an introductory/bootcamp style R course, oriented to non-STEM students.

The original lab design approach was based on live coding, classwork and homework and the goal of redesign was to demonstrate how small improvements can potentially help improve LX and learning outcomes.

Applying learning design principles

See 4C/ID model component schema for a quick graphic overview of 4C/ID learning design model.

Our instructional model asks us to:

Distinguish between authentic (whole) tasks and part-task practice for recurrent skills
Define supporting and procedural information for each task
Define task classes with decreased scaffolding

In addition to that, we also want:

to give students variation in difficulty to support different speed of progress
to support additional goal of teaching students to work with typical errors, extending on the reasons Carpentries models of teaching suggest live coding
define potential points where simple educational technology interventions can help to improve learning experiences

Defining necessary student background

At this stage students should have covered and practiced:

variables, functions (concept, black and gray box interpretations, calling), assignment
packages as function/object containers, package::function
rmarkdown/quarto (building, key formats)
data load, save, native r, csv
key descriptive stats: e.g. mean, median, sd, quantiles
pipe and key dplyr verbs: select, mutate, group_by, summarise, arrange
simple part-task practice with verbs in terminal and tidy data tutor - ideally, pivot_wider and pivot_longer

E.g. we covered briefly approx. by:

https://r4ds.hadley.nz/data-transform.html
https://r4ds.hadley.nz/workflow-pipes.html

Defining what the instructor should practice (or provide students to practice) before the class [Procedural information: recurrent aspects for practising/reviewing]

search for help
select columns
create new columns
group
summarise
sort (1, multiple criteria, desc)
reshape to wider format (for people) / longer (for code)
build, brake and visualize pipelines

Providing student with a backgroud information

For extra part-task practice: - TidyDataTutor

As a source of procedural information: - R4DS

Defining authentic (whole) task

4C/ID suggest us to put our lessons in the context of whole tasks, communicating what students will need to deal with at the job. This does not mean the task can not be adapted. They should be! Proper scaffolding and sequencing is crucial. However, we want to save authentic connection.

Here is an example of how we can define/communicate such a task for students:

What are we working with, doc? [Context]

We will work with the Campus Recruitment (Academic and Employability Factors influencing placement) dataset, which contains (presumably simulated) data on student background and factors influencing placement.

Task

Traditional tasks for analysts would be to find interesting patterns or disparities in employment data.

Tips for reasoning, decision-making, and problem-solving [Supportive information]

At this stage we will focus on how simple and powerful data aggregation techniques can help us:

understand hidden data patterns
formulate hypotheses for future analysis
bring some ideas back to senior analysts and stakeholders.

Remember:

Analytics is about comparisons
We might be interested in disparities, e.g. based on gender or work experience
We are more likely to be interested in general patterns than precise numbers, at least until we can evaluate uncertainty
Our ultimate goal is to support decision making, balancing precision and details

Data

Explore your data description:

k/factors-affecting-campus-placement

Context and Lesson Layout

Main content of the lesson

The actual script we would use for coding is here. It is structured to balance revision, task solving and reflection.

One key addition to live coding for facilitating active learning will be a log of typical errors. Understanding how to react to exceptional situations in programming environments is an crucial skill for novices.

It is also a skill to be automated so we need to provide a part-task practice. Simplest way to do that is for instructor to have a Typical Errors file or section shared with students. Each time an instructor (by design or not) encounters error, both error message and the way it needs to be dealt with go to the log.

We encourage students to share their own errors in a simplest possible way (copy-paste or even screenshots), curate them and engage with them until automation

Example life cycle might be:

Screenshot set by student -> adding to the file -> discussing how to fix -> reviewing at the start of next lesson -> retrieving on random next lessons (or using the bot)

Example error log for this lesson

:::

What else would we do during the lesson?

Review filtering and logical rules, ifelse/case_when
Unite aggregation and logical rules, e.g. highlighting largest disparities (gender-major, maybe based on salary)
Produce a simple report (e.g. gt table and color highlighting of largest/smallest disparities)
Remind again that we are looking for interesting patterns and need inference to figure out what really exists

What’s next [Global schema]

dplyr -> dbplyr -> sql
visualization (principles of viz for analytics)
dashboards
reports

Task classes

Tasks do not come alone. Part of our re-design is to degine examples of different task classes in the similar context. We can use that to provide diverse extra practice during the tutorial, for homework or as a task base for EdTech environments supporting student progression in mastering the skill.

While I demonstrate alternative tasks using the same data/context, it is recommended to vary them to improve complex skill formation and transfer.

Tasks and classes
simple aggregation for comparison	aggregation with multiple groups	aggregation with complex/custom function for advanced comparison
What is the median degree percentage of placed vs non-placed people?	What are median degree and mba percentages of placed vs non-placed people?	What are 0.1, 0.5, 0.9 quantiles for the degree percentage of placed vs non-placed people?
What are top 3 undergrad majors for placed vs non-placed?		What are ranks of top 3 undergrad majors for non-placed among placed?
What are the median salaries for people with and without work experience?	What are the median salaries for female and male candidates with and without work experience?	What are 0.2, 0.5, 0.8 quantile salaries for female and male candidates with and without work experience?

What are a potential EdTech integration scenarios?

Depending on our time and level of mastery we are aiming for students, we can use active learning principles and our 4C/ID model to choose different existing technologies or creating new ones. They key principle is cost-benefit analysis and the primary role of a learning goal technology should support.

We also should strive to integrate EdTech in existing student workflows as opposed to try recentering students attention to multiple artificial entry points, as that decreases the use of EdTech.

Some simple examples for this lesson:

Working on recurrent skills to be automated: Tidy data tutor
Expanding feedback for students in the process of data manipulation: tidylog
Chat bot integration, allowing students to self-test recognizing and reacting to typical errors, share and exchange errors messages