How to
This tutorial consists of two basic parts: Background information on knowledge space theory (KST) and Shiny Apps, which address certain concepts of it.
For a refresher, read through the texts about KST. Interactive apps are highlighted with dotted light gray borders. In these, you can usually visualize, calculate or otherwise interact with one or more of the previously mentioned aspects.
Give it a try ;)
Acknowledgement
The various apps were developed by participants as part of the TquanT and QHELP programs and often extended by Cord Hockemeyer. For a more consistent tutorial experience, they were adjusted and redundant content was removed. For more informations, the original apps and background information please have a look at the linked web pages.
The texts are partly taken from a talk by Jürgen Heller and Florian Wickelmaier, which was presented during TquanT.
Many thanks to all the students who developed the apps as part of the QHELP and TquanT mobility events and Alice Maurer for helpful comments and corrections.
If you want to learn more exciting concepts of KST, check out the pages of TquanT and QHELP. There you will find more student-developed apps related to KST.
Deterministic KST
Basic terms
The basis is the characterisation of a knowledge domain through tasks that test knowledge in a specific area. The tasks are dichotomous, i.e. they can be considered either solved or unsolved.
Formally, a knowledge domain is defined by a non-empty set \(Q\) of tasks.
E.g.: \(Q = \{a, b, c, d, e, f\}\)
The knowledge state of a person is represented by the subset \(K\) of tasks from \(Q\) that the person can solve, \(K \subseteq Q\). For example, if a person is able to solve items \(a\), \(c\) and \(d\), then he or she is in the knowledge state \(\{a, c, d\}\)
In principle, \(2^{|Q|}\) potential knowledge states exist. However, the number of knowledge states that actually occur is reduced, since not all subsets of \(Q\) are plausible due to dependencies between the tasks:
- Logical dependencies: the same solution steps that are required for a task also lead to the solution of another task.
- dependencies of the kind that if one (supposedly difficult) task is solved, it is practically certain that another (supposedly easy) task will also be solved (e.g. conveyed by the sequence of the corresponding content in the curriculum).
Formally, a knowledge structure is defined by the pair \((Q, \mathcal{K})\)
- \(Q\): a domain of knowledge
- \(\mathcal{K}\): the set of all plausible knowledge states \(K \in \mathcal{K}\), \(\mathcal{K} \subseteq \mathcal{P}(Q)\) containing at least \(\emptyset, Q \in \mathcal{K}\)
For a formal description of the dependency relations between the tasks in a knowledge domain \(Q\) one can, in the simplest case, define a binary relation \(\preccurlyeq\) on \(Q\) such that \(p \preccurlyeq q\) if and only if based on the correct answer of task \(q\) the correct answer of task \(p\) can be deduced.
The relation \(\preccurlyeq\) is called the precedence relation (or the surmise relation).
Due to the definition of the precedence relation \(\preccurlyeq\) on \(Q\), the following properties arise
- \(\forall p \in Q: p \preccurlyeq p\) (reflexive)
- \(\forall p, q, r \in Q: p \preccurlyeq q \land q \preccurlyeq r \Rightarrow p \preccurlyeq r\) (transitive).
A precedence relation \(\preccurlyeq\) on \(Q\) can therefore be regarded as a quasi-order (a generalisation of the weak order and the partial order).
Properties of relations
Task:
Create a random relation and see what happens when you remove and add
elements.
Which elements do you have to add to create a reflexive relation?
Your Relation
Properties of R
Hasse Diagram
Surmise Relation
Task:
Enter a relation by checking the corresponding boxes. Look at the graph
of the relation and compare it with the graph of the corresponding
Surmise relation next to it. There may be differences. What could be the
reason for these differences?
(Hint: Think about the necessary properties of a surmise relation).
Enter your Prerequisite Relation!
XY: X is in relation to Y. E.g. 13 means that item 1 is a prerequisite for item 3.
Incidence Matrices
A red background of a cell indicates that the entered relation was not transitive or reflexive and that element(s) were added accordingly to create a transitive and reflexive surmise relation.
A knowledge structure \((Q, \mathcal{K})\) that is closed with respect to set union is called a knowledge space.
Assumptions about the underlying learning process are:
- If a person is ready to learn an item, then a person who can solve
additional items has already learned the item or is also ready to learn
the item (alternatively, if a person is ready to learn an item, then he
or she remains so).
- This implies closure with respect to set union.
- From each knowledge state, the solution of new items can be learned
one at a time.
- This implies a property that allows going from any knowledge state to any other in the knowledge structure in steps of one item (“well-graded”). Not every knowledge space is necessarily well-graded.
If the structure is closed with respect to set union and set intersection, then it is called a quasi-ordinal knowledge space.
For quasi-ordinal knowledge spaces there is a one-to-one correspondence between the knowledge structure and precedence relation.
Theorem of G. Birkhoff (1937)
- There is a one-to-one correspondence between the quasi-orders \(\preccurlyeq\) on a set \(Q\) and the families of subsets of \(Q\) which are closed with respect to set
union and set intersection (a.k.a quasi-ordinal knowledge space)
- For a given relation \(\preccurlyeq\) on \(Q\), \(K \subseteq Q\) is a knowledge state in \(\mathcal{K}_\preccurlyeq\) if \[(q \in K \land p \preccurlyeq q) \Rightarrow p \in K \;\;\; \forall p, q \in Q\]
- For a given knowledge structure \(\mathcal{K}\), a precedence relation \(\preccurlyeq_\mathcal{K}\) on \(Q\) is given by \[p \preccurlyeq q \Leftrightarrow (q \in K \Rightarrow p \in K) \;\;\; \forall K \in \mathcal{K}\]
Validating Knowledge Structures
Click on the information button (“i”) to get a description of the different fit indices.
How well does the knowledge structure fit to the data?
Choose a data set:
Choose a coefficient:
Patterns of responses not included in the diagram:
Distance Distribution
Please keep in mind that the maximal possible distance isdmax = ⌊|Q|/2⌋ = 2.
Deterministic Assesment
Click through the three tabs and follow the instructions. In the beginning it is advisable to use the default structure. In a second step you can adjust this structure and see how it affects the assessment.
In this application, you have two options:
Then, you can simulate being a new student by answering a quiz.
After each of your answers, we will update the list of still eligible possible knowledge states.
These are all the possible response patterns.
As a default structure, we have checked the patterns which belong the knowledge structure observed from a previous
classroom sample. Both the empty set and the Q (full) set states have already been included.
You can use the one available, or create your own.
Don't forget to press the 'Done' button when you are finished
In the 'quiz' tab, you can experience a deterministic adaptive assessment of your knowledge in our small probability domain. We start with the full knowledge structure. As soon as you answer questions, states are eliminated. In the Hasse diagram on the right, the eliminated and the still eligible knowledge states are shown in different colors.
Please answer the following questions
The Assessment is completed.
Still eligible knowledge states
The Assessment is completed.
Still eligible knowledge states
The Assessment is completed.
Still eligible knowledge states
The Assessment is completed.
Still eligible knowledge states
The Assessment is completed.
Still eligible knowledge states
Probabilistic KST
Motivation
- In practical applications we cannot assume that a student’s response to an item is correct if and only if the student masters it (i.e. if the item is an element of the respective knowledge state).
- There are two types of response errors:
- careless error, i.e. the response is incorrect although the item is contained in the knowledge state
- lucky guess, i.e. the response is correct although the item is not contained in the knowledge state.
- In any case, we need to dissociate knowledge states and response
patterns.
- \(R\) denotes a response pattern, which is a subset of \(Q\)
- \(\mathcal{R} = 2^Q\) denotes the set of all possible response patterns
- In this approach
- the knowledge state K is a latent construct
- the response pattern R is a manifest indicator to the knowledge state.
- The conclusion from the observable response pattern to the unobservable knowledge state can only be of a stochastic nature.
- This requires to introduce a probabilistic framework.
A probabilistic knowledge structure (PKS) is defined by specifying
- a knowledge structure \(K\) on a knowledge domain \(Q\) (i.e. a collection \(K \subseteq 2^Q\) with \(\emptyset, Q \in K\))
- a (marginal) distribution \(P(K)\) on the knowledge states \(K \in \mathcal{K}\)
- the conditional probabilities \(P(R | K)\) to observe response pattern \(R\) given knowledge state \(K\) \[P(R | K) = \frac{P(R, K)}{ P(K)}\]
The probability of the response pattern \(R \in \mathcal{R} = 2^Q\) is predicted by (cf. law of total probability)
\[P(R) = \sum\limits_{K \in \mathcal{K}} P(R | K) \cdot P(K)\]
Local Stochastic Independence
Assumptions:
- Given the knowledge state \(K\) of
a person
- the responses are stochastically independent over problems
- the response to each problem \(q\)
only depends on the probabilities
- \(\beta_q\) of a careless error
- \(\eta_q\) of a lucky guess
- The probability of the response pattern \(R\) given the knowledge state \(K\) reads
\[P(R | K) = \left(\prod\limits_{q \in K \setminus R} \beta_q \right) \cdot \left(\prod\limits_{q \in K \cap R} (1 - \beta_q) \right) \cdot \left(\prod\limits_{q \in R \setminus K} \eta_q \right) \cdot \left(\prod\limits_{q \in \overline{R} \cap \overline{K}} (1 - \eta_q) \right) \cdot\]
A PKS satisfying these assumptions is called a basic local independence model (BLIM).
Simulating BLIM using predefined structures
Task: Choose a structure and see how the simulated response patterns change when the error rates are modified.
Example Spaces
As example data, knowledge spaces provided by the R package pks (Heller & Wickelmaier, 2013; Wickelmaier et al., 2016) are used. Concretely, the following spaces are used:- Density
- Taagepera et al. (1997) applied knowledge space theory to specific science problems. The density test was administered to 2060 students, a sub structure of five items is included here.
- Matter
- Taagepera et al. (1997) applied knowledge space theory to specific science problems. The conservation of matter test was administered to 1620 students, a sub structure of five items is included here.
- Doignon & Falmagne
- Fictitious data set from Doignon and Falmagne (1999, chap. 7).
References
Doignon, J.-P., & Falmagne, J.-C. (1999). Knowledge spaces. Berlin: Springer. Heller, J. & Wickelmaier, F. (2013). Minimum discrepancy estimation in probabilistic knowledge structures. Electronic Notes in Discrete Mathematics, 42, 49-56. Schrepp, M., Held, T., & Albert, D. (1999). Component-based construction of surmise relations for chess problems. In D. Albert & J. Lukas (Eds.), Knowledge spaces: Theories, empirical research, and applications (pp. 41--66). Mahwah, NJ: Erlbaum. Taagepera, M., Potter, F., Miller, G.E., & Lakshminarayan, K. (1997). Mapping students' thinking patterns by the use of knowledge space theory. International Journal of Science Education, 19, 283--302. Wickelmaier, F., Heller, J., & Anselmi, P. (2016). pks: Probabilistic Knowledge Structures. R package version 0.4-0. https://CRAN.R-project.org/package=kstAbout the Theory
In Knowledge Space Theory, a knowledge structure 𝒦 is any family of subsets of a set Q of items (or test problems) which contains the empty set {} and the full item set Q. Such a knowledge structure can be used together with the BLIM model to simulate response patterns.The Basic Local Independence Model (BLIM)
Assumption Given the knowledge state K of a person, the responses are stochastically independent over problems and the response to each problem q∈Q only depends on the probabilities βq of a careless error and ηq of a lucky guess for item q. In this app, we simplify the BLIM a bit further by assuming identical β and η values for all items.Simulating BLIM using items
Follow the instructions and click through the tabs.
Introduction
This app demonstrates the simulation of response patterns with the BLIM model. The app works with a set of five items on elementary arithmetics following the knowledge space used as standard example by Doignong & Falmagne (1999,BLIM Simulation
About this App
This app an adaptation of an app written by students at the TquanT Seminar 2017 in Deutschlandsberg, Austria. It was adapted by Cord Hockemeyer, University of Graz, Austria. © 2017, Cord Hockemeyer, University of Graz, Austria TquanT was co-funded by the Erasmus+ Programme of the European Commission.
Maximum Likelihood Estimation
Basics:
- The data consist of a vector specifying for each subject in a sample of size \(N\) the given response pattern.
- From this we can derive the absolute frequencies \(N_R\) of the patterns \(R \in \mathcal{R}\).
- For a given knowledge structure \(K\) the likelihood is given by \[\mathcal{L}(\beta, \eta; \pi | x) =
\prod\limits_{R \in \mathcal{R}} P(R | \beta, \eta, \pi)^{N_R}\]
\(\beta = (\beta_q)_{q \in Q}\)
\(\eta = (\eta_q)_{q \in Q}\)
\(\pi = (\pi_K)_{K \in \mathcal{K}} \;\; \text{with} \;\; \pi_K = P(K)\)
Determining the maximum likelihood estimates (MLEs) requires to compute the partial derivatives of the log-likelihood with respect to each of the parameters collected in the vectors \(\beta, \eta, \pi\).
- The problems concerning the analytical tractability of this derivation arise from the fact that \(P(R| \mathcal{K}, \beta,\eta, \pi)\) actually is the sum \[P(R | \beta, \eta, \pi) = \sum\limits_{K \in \mathcal{K}} P(R, K | \beta, \eta, \pi)\]
- Doignon & Falmagne (1999) thus resort to numerical optimization techniques.
- A (partial) solution to this problem is provided by the so-called EM algorithm (Heller & Wickelmaier, 2013; Stefanutti & Robusto, 2009).
EM algorithm
The EM algorithm is an iterative optimization method for providing MLEs of unknown parameters, which proceeds within a so-called incomplete-data framework.
Considering the given data as incomplete, and (artificially) extending them by including actually unobservable variables (‘unknown data’) often facilitates the computation of the MLEs considerably.
In the present context we assume that for each subject we observe both the given response pattern \(R\) and the respective knowledge state \(K\) (complete data), thus having available the absolute frequencies \(M_{RK}\) of subjects who are in state \(K\) and produce pattern \(R\).
The likelihood of the complete data then reads
\[\mathcal{L}(\beta, \eta; \pi | x, y) = \prod\limits_{R \in \mathcal{R}} \prod\limits_{K \in \mathcal{K}} P(R, K | \beta, \eta, \pi)^{M_RK}\]
The terms containing \(\beta, \eta\) and \(\pi\) respectively, can be maximized independently and MLEs \(\beta^{(t)}, \eta^{(t)}\) and \(\pi^{(t)}\) are obtained.
In a second step the expected frequencies \(M_{RK}\) are calculated with the updated MLEs \(\beta^{(t)}, \eta^{(t)}\) and \(\pi^{(t)}\) \[\mathcal{E}(M_{RK} ) = N_R · P(K | R, \beta^{(t)}, \eta^{(t)}, \pi^{(t)})\]
By repeating these two steps, the MLEs converge and the final estimators are obtained.
Parameter Estimation
Task: Read through the information on the different estimation methods. Then switch to the second tab.
- First choose the absolute frequencies of different answer patterns.
- Then decide in step 2 which quantities should be included as
knowledge states in the structure.
- (In a first step you can use the default values at step 1 and 2).
- In step 3 then estimate and visualize the parameters with the different estimation methods.
- Finally, at 4 and 5 you can output the expected frequencies of the response patterns and simulated response frequencies.
The three estimation methods
With a given knowledge structure and observed response patterns, there are three methods to estimate the parameters of a basic local independence model (BLIM):
- Maximum Likelihood (ML) Estimation
- Minimum Discrepancy (MD) Estimation
- Minimum Discrepancy ML Estimation (MDML)
Method | Principle | Pros | Cons |
---|---|---|---|
ML | estimates parameters that maximize the probability of the observed data |
|
|
MD | assumes that any response pattern is generated by the knowledge state closest to it |
|
|
MDML | ML estimation under certain MD restrictions |
|
|
For this example, we will be working with a knowledge domain of four items: Q = {a,b,c,d}
1) Set the observed response frequencies of ...
2) Set your knowledge structure
𝒦
You have selected the following knowledge structure:
3) Calculate your parameter estimates
The error rate estimates for each item are:
The estimates of the state probabilities are:
4) Calculated expected frequencies with each parameter set
5) Simulate response patterns with each parameter set
Probabilistic Knowledge Assessment
Click through the three tabs and follow the instructions. In the
beginning it is advisable to use the default structure. In a second step
you can adjust this structure and see how it affects the
assessment.
How is this different from deterministic assessment? Compare the “Quiz”
tab with that of the deterministic app in the previous chapter.
In this application, you have two options:
Then, you can simulate being a new student by answering a quiz.
After each of your answers, we will update the probabilities of your possible knowledge states.
These are all the possible response patterns.
As a default structure, we have checked the patterns which belong the knowledge structure observed from a previous
classroom sample. Both the empty set and the Q (full) set states have already been included.
You can use the one available, or create your own.
Don't forget to press the 'Done' button when you are finished
In this tab, you can experience an adaptive assessment of your knowledge in out small probability domain. We start with an equal probability distribution over the knowledge structure you have developed. After each of your answers, the probabilities are update according to the Bayesian Updating formula. On the left side, you see the question and answer possibilities, on the right side a Hasse diagram of your knowledge structure indicating the current probability distribution.