Redesign: Data Science Intro in R - Data Manipulation + Live Coding

Example of a redesign for a lesson script with live coding exercises
Author

Ilya Musabirov

Published

November 1, 2022

Note

This learning design example demonstrates redesign of a small chunk of a traditional lab with live coding which we improve from a 4C/ID learning design framework and active learning standpoints.

It is based on a module of an introductory/bootcamp style R course, oriented to non-STEM students.

The original lab design approach was based on live coding, classwork and homework and the goal of redesign was to demonstrate how small improvements can potentially help improve LX and learning outcomes.

Applying learning design principles

See 4C/ID model component schema for a quick graphic overview of 4C/ID learning design model.

Our instructional model asks us to:

  • Distinguish between authentic (whole) tasks and part-task practice for recurrent skills
  • Define supporting and procedural information for each task
  • Define task classes with decreased scaffolding

In addition to that, we also want:

  • to give students variation in difficulty to support different speed of progress
  • to support additional goal of teaching students to work with typical errors, extending on the reasons Carpentries models of teaching suggest live coding
  • define potential points where simple educational technology interventions can help to improve learning experiences

Defining necessary student background

At this stage students should have covered and practiced:

  • variables, functions (concept, black and gray box interpretations, calling), assignment
  • packages as function/object containers, package::function
  • rmarkdown/quarto (building, key formats)
  • data load, save, native r, csv
  • key descriptive stats: e.g. mean, median, sd, quantiles
  • pipe and key dplyr verbs: select, mutate, group_by, summarise, arrange
  • simple part-task practice with verbs in terminal and tidy data tutor - ideally, pivot_wider and pivot_longer

E.g. we covered briefly approx. by:

  • https://r4ds.hadley.nz/data-transform.html
  • https://r4ds.hadley.nz/workflow-pipes.html

Defining what the instructor should practice (or provide students to practice) before the class [Procedural information: recurrent aspects for practising/reviewing]

  • search for help
  • select columns
  • create new columns
  • group
  • summarise
  • sort (1, multiple criteria, desc)
  • reshape to wider format (for people) / longer (for code)
  • build, brake and visualize pipelines

Providing student with a backgroud information

For extra part-task practice: - TidyDataTutor

As a source of procedural information: - R4DS

Defining authentic (whole) task

4C/ID suggest us to put our lessons in the context of whole tasks, communicating what students will need to deal with at the job. This does not mean the task can not be adapted. They should be! Proper scaffolding and sequencing is crucial. However, we want to save authentic connection.

Here is an example of how we can define/communicate such a task for students:

What are we working with, doc? [Context]

We will work with the Campus Recruitment (Academic and Employability Factors influencing placement) dataset, which contains (presumably simulated) data on student background and factors influencing placement.

Task

Traditional tasks for analysts would be to find interesting patterns or disparities in employment data.

Tips for reasoning, decision-making, and problem-solving [Supportive information]

At this stage we will focus on how simple and powerful data aggregation techniques can help us:

  • understand hidden data patterns
  • formulate hypotheses for future analysis
  • bring some ideas back to senior analysts and stakeholders.

Remember:

  • Analytics is about comparisons
  • We might be interested in disparities, e.g. based on gender or work experience
  • We are more likely to be interested in general patterns than precise numbers, at least until we can evaluate uncertainty
  • Our ultimate goal is to support decision making, balancing precision and details

Data

Explore your data description:

Context and Lesson Layout

Main content of the lesson

The actual script we would use for coding is here. It is structured to balance revision, task solving and reflection.

One key addition to live coding for facilitating active learning will be a log of typical errors. Understanding how to react to exceptional situations in programming environments is an crucial skill for novices.

It is also a skill to be automated so we need to provide a part-task practice. Simplest way to do that is for instructor to have a Typical Errors file or section shared with students. Each time an instructor (by design or not) encounters error, both error message and the way it needs to be dealt with go to the log.

We encourage students to share their own errors in a simplest possible way (copy-paste or even screenshots), curate them and engage with them until automation

Example life cycle might be:

Screenshot set by student -> adding to the file -> discussing how to fix -> reviewing at the start of next lesson -> retrieving on random next lessons (or using the bot)

Example error log for this lesson

:::

What else would we do during the lesson?

  • Review filtering and logical rules, ifelse/case_when
  • Unite aggregation and logical rules, e.g. highlighting largest disparities (gender-major, maybe based on salary)
  • Produce a simple report (e.g. gt table and color highlighting of largest/smallest disparities)
  • Remind again that we are looking for interesting patterns and need inference to figure out what really exists

What’s next [Global schema]

  • dplyr -> dbplyr -> sql
  • visualization (principles of viz for analytics)
  • dashboards
  • reports

Task classes

Tasks do not come alone. Part of our re-design is to degine examples of different task classes in the similar context. We can use that to provide diverse extra practice during the tutorial, for homework or as a task base for EdTech environments supporting student progression in mastering the skill.

While I demonstrate alternative tasks using the same data/context, it is recommended to vary them to improve complex skill formation and transfer.

Tasks and classes
simple aggregation for comparison aggregation with multiple groups aggregation with complex/custom function for advanced comparison
What is the median degree percentage of placed vs non-placed people? What are median degree and mba percentages of placed vs non-placed people? What are 0.1, 0.5, 0.9 quantiles for the degree percentage of placed vs non-placed people?
What are top 3 undergrad majors for placed vs non-placed? What are ranks of top 3 undergrad majors for non-placed among placed?
What are the median salaries for people with and without work experience? What are the median salaries for female and male candidates with and without work experience? What are 0.2, 0.5, 0.8 quantile salaries for female and male candidates with and without work experience?

What are a potential EdTech integration scenarios?

Depending on our time and level of mastery we are aiming for students, we can use active learning principles and our 4C/ID model to choose different existing technologies or creating new ones. They key principle is cost-benefit analysis and the primary role of a learning goal technology should support.

We also should strive to integrate EdTech in existing student workflows as opposed to try recentering students attention to multiple artificial entry points, as that decreases the use of EdTech.

Some simple examples for this lesson:

  • Working on recurrent skills to be automated: Tidy data tutor
  • Expanding feedback for students in the process of data manipulation: tidylog
  • Chat bot integration, allowing students to self-test recognizing and reacting to typical errors, share and exchange errors messages