Lesson #6

Back to the Methods of Teaching Data Science course

Back to Technion Data Science Education website

This lesson was dedicated to data science thinking. Based on the presentation of the asynchronous task in which the students were asked to prepare a presentation about computational thinking and statistical thinking (see Table ‎2, in Grading policy and submissions), the lesson continued with elaborations and clarifications. Special attention was given to mathematical thinking and the process-object duality. The role of algorithms and data in data science was (re)addressed as well and their respective relationships to computational thinking and statistical thinking.

Due to time constraints, and in order to allocate enough time for the discussion about domain thinking, the domain thinking component of data thinking was scheduled to be discussed in a future lesson (Lesson 8). In preparation for this lesson, the students were asked to work on an asynchronous task on ethics: Comparisons between different data science codes of ethics (see Table ‎2, in Grading policy and submissions).

The discussion on the process-object duality was based on:

  • students’ sharing the mathematical concepts they found difficult to understand during their mathematical studies at school and during their undergraduate studies. Among the concepts they mentioned were derivatives and proof by induction. With respect to these concepts, we discussed how process conception and object conception are expressed and what mathematical problems each conception enables us to solve.
  • students’ answers to the following question: How would you explain to a friend what the KNN algorithm is? (see Q.1 in Figure 4: KNN comprehension questionnaire, Mike and Hazzan, 2022, p. 11). Here are several illustrative answers:
    • Classify something into a category according to its proximity to other known objects.
    • If there were two different groups and you had to choose which group to join, how would you choose? Expected answer: According to what is similar to me.
    • The ability to classify an example according to the examples that are most similar to it.
    • KNN is a machine learning algorithm.
    • You tend to act like your neighbors.
    • KNN is a particular way of classifying a particular data or image according to other examples.
    • Tell me who your neighbors are and I will tell you who you are.
  • students’ answers to the following question: (see Q.4 in Figure 4: KNN comprehension questionnaire, Mike and Hazzan, 2022, p. 12).

In order to classify dogs as Poodle or Labrador, four characteristics were selected: height, weight, tail length, and ear length. The training set included 1,000 dogs; 500 of each kind. Based on this data set, we wish to classify an unknown dog using the KNN classifier.

  1. For K=5: How many times is the square operation executed?
  2. For K=11: How many times is the square operation executed?
  3. What conclusion can you draw from your answers to the above two questions?
  4. In your opinion, when are the chances of a correct classification higher?
    1.  K=5
    2. K=11
    3. It is impossible to decide
    4. I do not know
  5. Please explain your answer.

In this question, the students are asked to indicate, for K=5 and K=11, how many times the square operator must be calculated in a specific classification problem using a KNN algorithm. Although the K values are different, the square operator must be calculated anew to find the Euclidian distance between the unknown instance and each instance in the training examples. In other words, 4,000 calculations are required in both cases (4 features x 1000 instances in the training set), regardless of the value of K.

The initial answers students gave were 1000 (ignoring the 4 features) and 2000 (which assumes the data is represented in a two-dimensional space). Following a class discussion, they understood the correct answer. This observation led us to the conclusion that we should dedicate a lesson to actual data science content rather than to its pedagogy. Accordingly, the next lesson (Lesson 7) was dedicated to data science problem solving with Python.

References:

Mike, K. and Hazzan, O. (2022). Machine learning for non-major data science students: A white box approach, special issue on Research on Data Science Education, The Statistics Education Research Journal (SERJ) 21(2), Article 10.