March 12th~Several open access preprints this week! Plus AI in reading instruction (some of my thoughts)
The Weekly Email Newsletter That Keeps You Informed of the Latest Reading Research!
Welcome to the Reading Research Recap, a weekly newsletter featuring the latest reading research published in peer-reviewed scientific journals. The goal of the Recap is to share recent scientific findings and foster an appreciation of science as a way to navigate the world. I try to make this one of the most informative emails you get each week. If you enjoy this issue, please share it. I am always interested in improving the newsletter and welcome feedback.
Welcome! This is Issue No. 42
“Science and everyday life cannot and should not be separated.” – Rosalind Franklin
Updates
No updates this week!
Research Highlights
📉📊📈📑📝
Clarifying the Relationship Between Early Speech-Sound Production Abilities and Subsequent Reading Outcomes (open access preprint)
“Speech-sound production accuracy uniquely contributed to the prediction of word reading; whereas full mediation effects of core pre-literacy skills and SES were identified for decoding and fluency. For reading comprehension, full effects of pre-literacy and vocabulary skills were observed. Hierarchical regression models further revealed the relative contributions of each factor to respective reading outcomes.”
Evolution of reading and face circuits during the first three years of reading acquisition (open access preprint)
“The growth of word-induced activation at the classical coordinates of the VWFA [visual word form area] is primarily due to schooling. The growth of face responses, particularly in the right hemisphere, is primarily affected by age rather than by schooling.”
Inclusion, Dyslexia, Emotional State and Learning: Perceptions of Ibero-American Children with Dyslexia and Their Parents during the COVID-19 Lockdown (open access)
“The results offer a comprehensive viewpoint (family and children) on the aspects that have helped and hindered learning, such as teacher and family support, emotional state, use of ICT, and the importance of the voluntary/association network. The study provides evidence of how lockdown and school closures have created additional difficulties for learning but also how certain educational processes have been bolstered with the support of technological resources that should serve as benchmarks for education policy and classroom practice.”
Can a game application that boosts phonics knowledge in kindergarten advance 1st grade reading? (open access preprint, not sure if peer-reviewed yet)
“In a crossover effect, children who used the phonics version improved in letter naming, grapheme-phoneme matching and reading fluency, while those with the number version improved in number knowledge. In a longitudinal follow-up, intervention participants maintained an advantage in phoneme awareness and grapheme-phoneme matching at the start of 1st grade, but this advantage failed to translate into school literacy gains in the middle of 1st grade, and no longitudinal benefits were found for numbers. Those results improve our understanding of when and for how long to introduce phonics and question the possibility that a short-term intervention may address the complex challenges of long-term educational goals.”
Opinion Pieces
Reading Research Quarterly (RRQ) has a new special issue out…
Teaching Reading Is More Than a Science: It’s Also an Art
Conflict or Conversation? Media Portrayals of the Science of Reading
Other
Understanding the Research-Practice Gap for Speech-Language Intervention via SLPs’ Endorsement of Myths (an interesting poster, not yet peer-reviewed)
In-Depth: AI in education
Instead of covering a reading research paper this week I wanted to share a few thoughts I had after recently finishing The Alignment Problem and watching Coded Bias. (you can rent Coded Bias from an indie theater for about $12, but you will have to create an account).
AI & Reading
With regards to reading development/instruction, “A.I. (or, more likely, machine learning)” is generally baked into products that involve speech recognition or eye-tracking for the diagnosis and/or remediation of reading difficulties.
While I agree with other reading researchers that AI has the potential to solve problems in reading, the reality is that it is still early days.
What is the problem?
In a nutshell: there is little to no oversight in the field of A.I. right now which can lead to biased algorithms.
Bias in algorithms, for example, can lead to
sending the wrong people to jail (law enforcement in several cities)
There are many more examples detailed in Christian’s book and in Coded Bias.
Bias in algorithms can be broken down into three categories: representation, fairness, and transparency.
Representation
What data is being used to train the model?
Representation primarily concerns the datasets being used: does the data being used to train the model represent the population it will be deployed on? Lack of representation in the datasets was what created flawed/biased models by Google and Amazon listed above. For example, Amazon optimized their hiring model for similarity of new resumes to the resumes of previous hires. The problem was that most of the hires in the last ten years were males. So, the model learned to negatively score women’s resumes. Even when they told the model to ignore words like “women’s” before activities or sports (e.g., “women’s volleyball team) on the resumes, the model was still learning to pick out more subtle features, such as women’s colleges, for example. Unable to find a solution, they scrapped the model.
Representation in education and reading research is still an issue and there has been a greater push for researchers to give detailed demographic information on participants in research studies to make sure the findings are generalizable. Representation in datasets used for AI related to reading is extremely difficult to find. I checked a few of the websites of companies using AI in reading and none mentioned the demographic breakdown of their training datasets. In fact, it was hard to find any information about how they trained their models.
Fairness
What are the model engineers using to assess their “ground truth”?
Is the model using a “ground truth” indicator or a poor proxy? For example, in the flawed AI used in law enforcement, they wanted to predict the chance that a defendant would commit a crime in the future, but they were not actually using that as their outcome measure. Rather, they were assessing whether a defendant was rearrested or reconvicted. These latter measures are poor proxies because certain groups of people are more likely to be arrested in the first place.
Christian states, “If there are systematic differences in the likelihood of people from different groups to be convicted after arrest, or arrested in the first place, then we are at best optimizing for a distorted proxy for recidivism, not recidivism itself. This is a crucial point that is often overlooked.”
As “fairness” relates to AI in reading, I think in education we need to be especially careful of the measures we are using. It is easy to see an AI model being built to predict some high-stakes test, but if that high-stakes test is systematically biased in the first place, it might not be a good measure to use.
Transparency
How is the model reaching a conclusion?
Transparency has to do with how understandable the model is. Lack of transparency/interpretability exacerbates the problems of bias in models because it makes it difficult to see how the model arrived at a particular conclusion. In some fields it might be fine to have a model that is excellent at predicting a certain outcome even though it can’t tell you how it arrived at that conclusion. For example, if my neural net was 99% accurate in telling me Gamestop’s stock price at market close each day, then I wouldn’t necessarily care how it reached that conclusion.
However, in other fields like medicine and education, we really need to know why a model is functioning the way it is. We need to know what features and inputs are salient to arriving at the outcome and what is going on inside the “black box.” The goal in education is to change student outcomes for the better, not simply predict their outcome. Model interpretability is an exciting new field and I think reading about the new developments in this area was my favorite part of the book.
So, why didn’t Google/Amazon/police departments catch these biased algorithms?
This is a complicated answer because it is impossible to know all the little things that went wrong along the way, but it seems they (the engineers, their bosses, the leadership team) just didn’t check their work closely enough. AI models can be extraordinarily complex, and, as discussed above under “transparency” - it is hard to know why they are arriving at a certain prediction. The most successful AI methods for prediction (deep learning methods such as neural networks with several hundred hidden layers) are often not understandable or explainable by even the people who created them (though this is changing as new developments are being made).
Another reason put forth for why these biased models were allowed to be deployed has to do with awareness. Perhaps the engineers who deployed the models simply were not aware that bias could be an issue. Joy Buolamwini highlights the point that most algorithms are developed by white males and it is not coincidence that the models perform best on white male data. If you think lack of representation is just a “big tech” problem, I would encourage you to look at the engineering and leadership teams of the companies who are developing AI for education and the large EdTech companies that are adopting/incorporating it.
The Wait to Fail approach doesn’t work…
I have asked some of the above questions to EdTech companies and they did not have answers or were unwilling to share the answers because their models were proprietary and in essence, the “secret sauce” of their business models. As an EdTech founder, I understand that concern, and it is legitimate, but there needs to be some way checking for bias in educational AI before widespread deployment/adoption. [side note: I emailed Brian Christian, the author of the book, if he had come across any solutions in his research. He asked a couple of his friends on twitter, but this is a new field, without established protocols/standards.]
It is evident from the above examples that blind trust in AI coupled with a “wait and see” approach can be detrimental. This is especially true in education: could you imagine the implications of a biased speech algorithm that was supposed to help children with reading impairments? Who would even notice the bias? The child who is using it? Would they even recognize it as biased and would they feel confident reporting it to a teacher? I doubt it. Do teachers have time to sit down and test different dialects/words/errors on the AI program being used? As was evident in the Amazon resume fiasco, AI models can learn to latch on to very subtle features in the training data- so a quick trial run might not suffice in discovering an issue. There’s no easy answer at this point, but those of us in reading/education know that the “wait to fail” approach does not work, and we can’t just sit around and hope that the Joy Buolamwinis of the world will do the work for us.
So, what can we do about it right now?
As a teacher, parent, administrator who makes the purchasing decisions, you owe it it to your students (or children) to ask questions and educate yourself. Until there is third-party that can audit AI/ML (like a “Consumer Reports” for tech companies), the burden falls on us to ask the important questions. You do not need to be an AI/ML expert to ask questions that can help determine if a model might not be appropriate for your students (see below).
What should you ask?
I would ask companies that use A.I. to share their basic research studies. Not studies about AI generally (as most do on their websites), but specifically; what studies have they run?
I would ask about sample size and demographics (you want to make sure the sample they used reflects the sample you will be using the product on). What datasets did they use to train their model?
I would ask how they define accuracy. When they say their system is 98% accurate, ask: on what criterion? Is their “ground truth” measure a good one, or is it a poor proxy for what we really want to know?
I would ask for their accuracy data broken down by demographic. When their system fails, does it fail equally across gender and racial backgrounds, or; like coded bias showed- was it worse for one demographic?
These are just my thoughts/opinions and I’m sure there are other questions I am missing. I would love to hear your thoughts. Feel free to comment on the post or reply directly to me.
Side Tangent: The AI buzz
AI is so trendy right now that businesses use it as a buzzword to attract customers even though what they are using is likely not truly AI, but rather, relatively ‘simple’ machine learning models that have been around for decades.
For example, every time I use a computer stats program like r to fit a regression model, that is technically machine learning: since the machine is learning to find the optimal regression weights/coefficients to fit the data.
AI is like an extreme form of machine learning where the computer is not only learning how to fit that regression model, but is self-teaching itself on how to fit future regression models (i.e., it is doing a human thing - it is trying to replace me as the researcher). Anyways, it is almost irrelevant what they call it, because both machine learning and AI can have serious flaws as described above.
Companies touting their AI
Screenshot from Lexplore
Screenshot from Dystech