back to top

Methodological Advances in Assessment

EAPA Digital Event 2021


Using Big Data to Study Individual Differences In the Wild (Keynote)

Sandra Matz, Columbia Business School, NYC

Whether we like it or not, every step we take in the digital environment leaves a footprint. Due to advances in the collection, storage and processing of large amounts of data, these digital footprints are now available to researchers at little to no cost and can be used as valid cues to human behavior. In my talk, I will explore different ways in which we can use Big Data to study the relationships between people's latent psychological dimensions (e.g. personality) and the decision they make "in the wild" (e.g. consumption choices or financial decisions). More specifically, I will present research addressing the following questions: (1) What can Big Data tell us about the real-world preferences and decisions of people with different psychological characteristics? (2) How can Big Data be used to predict psychological traits using machine learning? (3) How can the combination of these two Big Data approaches help individuals and businesses make better decisions?

Date: 6 May 2021, 6 p.m. (Berlin time)

Register: Send email to and state your name, the event(s) you want to attend and the email address we can contact you with.


Advancing Assessment through Log Data, Natural Language Processing, and Machine Learning (Symposium)

Fabian Zehner & Carolin Hahnel, DIPF, Germany

When test takers interact with assessment environments, computer-based assessments can capture such interactions as process information in form of rather complex log data. Similarly, when test takers provide an open-ended response as a text or speech act, the resulting natural language constitutes unstructured information that is challenging to process automatically. Even though log data and natural language as semi- and unstructured information bear additional methodological demands for assessments, they can contain highly valuable diagnostic information, constituting two streams in modern multi-modal assessments. Both allow analyzing individual information objectively at the large scale.

The symposium brings together four examples of beneficial uses of log data analysis, natural language processing, and machine learning in assessments, but also discusses challenges that need to be met by ongoing research. Starting with a look back at the expectations we associate with technology-based assessments, Samuel Greiff shares a first conclusion about our current standing with respect to large-scale process analysis. Qiwei He then presents recent results of her research on reading-related processes using log data from the reading literacy assessment of the PISA 2018 study. Afterwards, Art Graesser and colleagues demonstrate the potential of natural language processing for students’ learning about STEM topics, presenting their intelligent tutoring system ElectronixTutor. Finally, through their series of experiments, Andrea Horbach and Torsten Zesch point out some important obstacles in the thoughtful treatment and processing of text responses. Beside the vast potentials of these methods, the symposium also touches on their constraints, such as the matter of process indicator validation, current limits of computers’ natural language understanding, as well as risks and ethical responsibilities when employing machine learning in psychological assessments.


20 years of technology in large-scale assessments: Lost efforts or sustainable change?

Samuel Greiff, University of Luxemburg

Over the last 20 years, educational large-scale assessments have undergone dramatic changes moving away from simple paper-pencil assessments to innovative, technology-based assessments. This has had implications for obtaining, reporting, and interpreting results on student skills in international comparisons. In fact, on the basis of these innovative simulated assessment environments, news about student rankings, under- and overperforming countries, and novel ideas on how to improve educational systems are prominently featured in the media. This talk will discuss what these new assessment environments, as an entity, have brought to research, practice, and policy and will discuss whether the initially high promises have been fulfilled.


Dynamic Navigation in PISA 2018 Reading Assessment: Read, Explore and Interact

Qiwei Britt He, Educational Testing Service (ETS), Princeton

This paper illustrates how students dynamically navigated and allocated their time through emerging multiple-source reading items in PISA 2018 reading assessment and examines the relationship between navigation skills and performance in reading in digital environments. These results demonstrate the importance of deeper insights in not only what students responded but also how they reached to their responses.


Automated Understanding of Natural Language Answers to Questions in ElectronixTutor

Arthur C. Graesser, Colin Carmon, & Brent Morgan, University of Memphis, Psychology and the Institute for Intelligent Systems

ElectronixTutor is an intelligent tutoring system that helps students learn electronics by holding conversations in natural language on electronic circuits when asked questions. The system assesses semantic similarities between the student’s natural language contributions with expected ideal answers to the questions. This presentation will describe computational linguistics mechanisms that assess semantic similarity between computer and humans, versus humans with each other. Advances in computational linguistics have reached the point where they are on par with semantic judgments of human experts. We have reached a point in history when computers can analyze essays and answers to questions that are comparable to human instructors, who typically dislike grading such verbal protocols.


Automatic Content Scoring – Challenges, Limitations and Pitfalls

Andrea Horbach, Torsten Zesch, University of Duisburg-Essen, Germany

The fully automatic scoring of short free-text answers using methods of natural language processing seems a promising way to reduce human scoring efforts in educational scenarios. In our presentation, we have a closer look at the limiting factors determining the feasibility of such a task. We present experiments on the amounts of training data necessary to build a scoring model, and discuss the effect of non-standard learner answers, such as either linguistically malformed answers or adversarial input intended to game a scoring system.


Date: 6 May 2021, 3:30 p.m. (Berlin time)

Register: Send email to and state your name, the event(s) you want to attend and the email address we can contact you with.