Ep. #10, Ethical Benchmarks for AI with Dr. Marie Oldfield
In episode 10 of Generationship, Rachel Chalmers and Dr. Marie Oldfield explore ethical benchmarks in AI. This conversation sheds light on professional accreditation for AI practitioners and offers unique insights on mitigating societal risks while building better AI models. Additionally, Marie shares lessons learned from her expansive career journey across Government, Defense, Legal, and Tech.
Dr. Marie Oldfield, CStat, CSci, FIScT, SFHEA, APAI is the CEO of Oldfield Consultancy, Executive Board member at the Institute of Science and Technology (IST) & Senior Lecturer in practice at LSE.
Dr. Marie Oldfield is a recognized, published AI and Ethics Expert with a background in Mathematics and Philosophy. Marie is a trusted advisor to Government, Defense, and the Legal Sector among others. Marie works at the forefront of Ethical AI, driving improvement and development. Marie is the creator of the AI Professional Accreditation that benchmarks practice in AI and interdisciplinary modeling.
In episode 10 of Generationship, Rachel Chalmers and Dr. Marie Oldfield explore ethical benchmarks in AI. This conversation sheds light on professional accreditation for AI practitioners and offers unique insights on mitigating societal risks while building better AI models. Additionally, Marie shares lessons learned from her expansive career journey across Government, Defense, Legal, and Tech.
transcript
Rachel Chalmers: Today, I'm absolutely thrilled to have Dr. Marie Oldfield on the show. Marie is the CEO of Oldfield Consultancy and Senior Lecturer in Practice at London School of Economics. She is a recognized published AI and ethics expert with background in mathematics and philosophy. Marie is a trusted advisor to government defense in the legal sector amongst others.
She works at the forefront of ethical AI, driving improvement and development. Marie sits on the board of the Institute of Science and Technology and is the creator of the new AI Professional Accreditation that benchmarks practice in AI and interdisciplinary modeling. Marie, thank you so much for coming on the show today.
Dr. Marie Oldfield: It's my pleasure, thanks for having me.
Rachel: It's our pleasure, absolutely. How did you come to establish a consultancy around ethical AI and modeling?
Marie: I think that's quite a complex answer because it goes back quite a long time. I've been working in modeling statistics and mathematics for a long time now. And where I've been doing a lot of that in government and defense, I saw a lot of potential issues that could come up and we weren't able to really deal with those in that environment.
So moving out into a consultancy, I was able to work with the government and defense as a client and come back in that way and have more kind of ownership, responsibility and accountability in fixing the issues that were already there.
And also it seemed like a good time at the time to branch out by myself because I was becoming more interested in a variety of work and a portfolio of work and being able to have work packages that were mine rather than having to kind of go into the office and be told what to do every day or creating something but it not necessarily being completely my control. So being able to control my own kind of destiny was one of the reasons.
Rachel: Yeah, and that doubtless cuts back into some of the issues that you saw in the modeling. What are some of the potential problems that you saw, particularly in the use of data modeling in defense?
Marie: There's a myriad actually of issues and some of it is technical and some of it's human factors. And I've just completed a large program of research on this which includes some empirical studies from practitioners. So asking them directly, what are the issues that you're facing?
And where we went from kind of statistics to machine learning, to data science, to AI, what's happened is there's not been a professional body really following that trajectory. And so there's not been the professional benchmarking. The education systems not follow that trajectory. So there isn't really an ethics benchmark in there.
And then government have decided to try and legislate and regulate and without the education and the professional body and the accountability, it's pretty difficult to then start to hold people accountable when they've not actually ever really been trained in many cases. Many people kind of do degrees and then they come out and they get into really complex modeling really quickly.
And what they might not have is the understanding of how you do this ethically. They might not be a statistician. They might not know how to collect the data. They might not be a software expert. They might not know how to implement with user testing.
They might not be somebody that considers the context within the data, which is what we're trained to do as statisticians. And so they might not understand the whole flow of the model. And you can see in cases like the Home Office visa algorithm and facial recognition technology, that this has failed and it's failed really hard.
And some of the reasons behind this are not only the lack of interdisciplinary experts and the different types of people that you need in the pipeline, but the human factors of not being able to challenge people, not being able to raise issues within a model, not being able to use the correct kind of code bases, repositories, not being able to get the correct data, being told that something's needed now and you've not got the time to do anything else.
So just build something and tell me what the answer is. And all of that feeds into really poor models that then impact on society and individuals really badly. And in my case, in my work, it can cost lives. And people get surprised when I say that, but it does. It costs lives not only in defense, but in society. And that's something that we want to avoid.
So being able to get ethical benchmarks into the practice of building models and using AI, data science, everything like that, and bringing in the right people at the right times, having the right expertise on board, yes, it's going to cost you a bit more money. Yes, it might take a little bit more time, but when you get to the end of it, you've not wasted 10 million pounds worth of taxpayers' money and you've not killed people.
So to me, that extra month and that extra bit of expenditure upfront is completely worth it.
Rachel: And this is where the agency and autonomy of being an external consultant and having that third-party perspective can be so powerful for you. For our American listeners who may not be as familiar with the Home Office fiasco, can you just give a brief outline of what happened?
Marie: Yeah, really briefly, it was a whole algorithm that was designed to kind of assess visas and it was developed in quite rapid succession without the correct expertise on board. And when it was implemented, it caused a huge amount of havoc because it just didn't work.
And what you've done there is you've wasted a lot of money, caused a lot of societal upset, and then you have to go back to the beginning and start again, which nobody wants to do that. If you kind of break it up into milestones, you can go back to the previous milestone and then work forward from that, but this just completely failed. It got completely withdrawn.
And it's very rare for that to happen, but it shows the actual scale of potential damage that that could cause, that the whole system was completely withdrawn.
Rachel: And for every big public disaster where the algorithm is scrapped, we know there's a hundred algorithms that are also underperforming and putting lives and livelihoods at risk, but not failing in such an obvious way.
And so they continue to be out there in the market and people continue making decisions on the basis of them. Is that the sort of thing that keeps you up at night? Is that what your work is aimed at managing?
Marie: It's something that I spot a lot, especially with kind of health apps. If you change your date of birth in the health app, it will tell you that you've got a completely different healthy age. So if you're a bit older, it might tell you that your healthy age is near to your actual age. If you put in that you're a bit younger, it might give you a much nicer younger age.
So there's not really any consistency in some of these algorithms and the kind of way in which you can plug and play these days, you can just build them even if you're just, you know, you come off the street and you're saying, "Right, I want to build something."
I mean, obviously you might need a bit more technical know-how than that, but if you just kind of build them in an app environment, you can kind of build whatever you want and there's not really any oversight. There's not really any requirement for you to have the right people there or the expertise and then you can just sell them on an app store.
And this leaves people open to manipulation. It leaves them open to believing things about themselves that are potentially not correct and in certain circumstances, it can lead to fear and distress and it can lead to vulnerable people being manipulated in certain ways as well.
So what we're trying to do is avoid all that and avoid the damage to society and kind of benchmark, formalize the profession of analysts and model builders, anybody that works in that area really to be able to not only be respected a bit more because what they're doing is performing a serious function for the world, but to be able to help people with guidelines and to say, "This is how you might do this in a best practice, this is the guidance you might use, this is the legislation that might affect you."
And then making sure that there's a really nice ecosystem that works for positive modeling, that's fit for society, not the kind of, you know, it's a bit of a Wild West that we're currently in.
Rachel: How might we mitigate some of these risks? Like in the specific case of the health apps that have wildly varied outcomes based on the input of the age, how do you equip people to build better models than that?
Marie: I think it's about not trying to chase profit in some circumstances, because it's really easy to say, "This is AI, I'm going to go and get some grant funding, I'm going to put in for some funding source, or I'm just going to build it and sell it." And that depends on how good your marketing is, whether you sell it, not, you know, necessarily whether it's any good.
So there's kind of a few schools of thought, how do you deal with that? Well, you can put in regulation legislation, but in that case, you have to train people in order that they understand what that means and how they're being told to model.
But then what that means is the people that are providing the regulation legislation have to set out how you would model, but nobody's set that out really yet. So there's a lot of institutes and groups and working groups, and everybody's pushing out all these ethical guidelines, they're all pushing out guidance, this is how we do it, this is how you should do it.
And if you're kind of an SME, or you're somebody that's new to the profession, you're saying, "Whoa, hang on, how do I navigate this environment, but not get into trouble, but build something that is useful and that I'm being asked to build, or that I want to build?"
And that's a pretty tricky situation to be in. And that's one of the reasons that we've established a professional accreditation at the Institute of Science and Technology.
Rachel: So tell us the story of how the professional accreditation came about. When was the idea first floated and how have you built it since then?
Marie: It came about really from the empirical study of practitioners and a paper that I did on how modeling takes place in the government. And what we found was there really wasn't very much best practice, and there really wasn't an understanding of how you might bring on the right people to model.
There wasn't any availability to challenge and people just weren't respected. The analysts weren't respected and they were just being completely just overwritten by whatever policy there was at the time, which is not how government works. It's supposed to be evidence-based. You're supposed to take all the evidence, weigh it up and then say, "Right, let me build a policy that's fit for society on that."
But the opposite was happening and this is quite damaging. So when I asked the practitioners, what kind of things do you do to ensure ethical modeling? They said, "Well, there's too much guidance. There's too much regulation. We don't know how to deal with it."
So what we do in the majority of cases, the answers were we use PR on the backend to try and fix any issues and then we can just carry on because we can use the marketing machines to kind of clear that up. That's not really very ethical.
So what we want to do is, enlighten people as to how they can navigate the different pieces of information that are coming at them, but what does best practice look like? And in the professional accreditation, what we do is we assess people by these different criteria and we say, "How do you collect data? How do you model? What do you do for user testing?"
And the beauty of the accreditation is it doesn't necessarily have to be a technical person. You can be a philosopher working on a model and you still have to understand how does that model work? What decisions get made? How do they get made? And then you can still go for the accreditation.
Rachel: Wow, I wish we'd had this for psychology 20 years ago so we wouldn't be running into the replication crisis.
Marie: You know, it's one of these things that takes you shining a light on it to say, there's a little bit of a gap there and it becomes even more difficult when people are going for funding because there is no ethical gateway for funding.
So our funding bodies just say, "Have you got something interesting?" Not what ethics are surrounding this. And by ethics, it might be, what is the data context? How do you want to minimize harm to society? Have you looked at the demographic that you're targeting here? Is it correct? Are you discriminating against anybody? Even basic questions like that are not being asked.
And it was even worse in government where these models were just consistently broken. And I was called in to fix a few of them. And the problem is that there's a lack of resources, a lack of understanding. Even though there is best practice in government, it's not really very widely known and it's not very widely used in some departments.
So then you face an uphill struggle. So we said, "How can we provide a professional body "and a professional accreditation "to try and fill some of these gaps "that are in education, in the profession at large, "and then help to make modeling safer for society?" Because it seems like that might be a good way forward at this point with all the different problems that we've had with models in the past.
Rachel: What kinds of people are coming to you to obtain the professional accreditation? Is it a mix of government and academia and private sector? Who are the people who are drawn to it?
Marie: It's a really big mix, actually. We've got people from industry, academia, government. We've got people from all over, SMEs, small business, large business, especially large business because it is quite time-consuming and intensive to work up any level of professional benchmark because there's so many moving parts within a modeling project.
So the fact that this has already now been done and it's been approved by industry, government, and academia to move forward with means that we've now got something that is kind of, we've had it approved by people. We've not just done it ourselves and said, "We think this is our way of best practice." We've checked it across multiple industries.
So it attracts people like, we've got philosophers, archeologists. It might be, I mean, a lot of people think, why would an ethno-historian or an archeologist want to get a professional accreditation in AI? If you're looking at rock art and you're trying to analyze that with AI or you're looking at native art, you might not consider the context in which the culture views their art or the different connotations that they work with in interpreting it.
If you can't understand that, then it is very difficult to ethically do anything with AI with that art. And it's the same for data. If you don't understand the data, you can't possibly model it correctly because you don't understand enough about what the data is telling you. And that's a really important story.
So that was kind of the driver to make it interdisciplinary because you can't just say, "Well, hang on, you can build it so you can be accredited, but you don't necessarily build it, but you have a huge input into it and you kind of frame what needs to be modeled and decide how it is needing to be modeled, but you can't be a part of it." We need everybody to be part of this conversation to move forward in a reasonable manner.
Rachel: Your point about archeology is really well taken because people think it's about dead people and it's not, it's about living cultures. And in the 30 years since I studied it, it's become much more explicitly political and we're starting to see the return of some sacred items to, for example, Australian Aboriginal people, which is super meaningful to me. So thank you for calling that out. That's a really significant thing.
I want to talk about your theory of change. How will you know if the people who have your professional accreditation are able to be heard in these environments and are able to state their ethical positions and their priorities in a way that actually moves the needle? What will you measure to measure the efficacy of the program?
Marie: This is pretty tough. And it's a very difficult discussion that we've had around this accreditation because in any profession, you have to uphold codes of ethics and you might have to have really difficult conversations. So that necessitates some sort of soft skills and maybe training if you're wary of conflict.
However, these conversations need to happen because if they don't, then you're not going to move forward at all. Like you said, you're not going to move the needle on it. What I came up with in order to deal with that was a piece of software called Transparent AI.
And it works through the whole modeling process to say, who do you need there? What challenges are there? What risks are there? If the board needs to know what level of risk is currently in this model, what is the level of risk? Have you bypassed any major steps, any major documents that you need to have present? Have you done an ethical review? Have you done user testing? Lots of different aspects that you need to consider that aren't necessarily normally considered.
Because I felt that even with an accreditation, even if we get managers in there as advanced practitioners, there still might not be the culture within the business to be able to have the discussions that we want people to be having.
So when I looked at developing this software, I was speaking to industry at the same time and governments and saying, what would this look like if you could have an openness and a conversation and a way to be able to highlight issues that meant they could get fixed and they could get some priority. And this type of kind of pipeline software was something that I came up with in response to that.
So we're kind of coming at it in numerous different angles, but we're hoping that the professional accreditation as it spreads will show that actually you don't have to do excessive amounts of work to make your work ethical. You just have to be more mindful of how you're doing it and more effective in what you're doing.
Rachel: That's really brilliant that the transparent AI serves as a sort of live checklist to make sure that you're at least considering that the parameters of what it means to create an ethical model.
Marie: It holds conversations as well. So if there are any challenges, then it sits in there. And that's really important because as we move towards regulation legislation, if your AI is taken to court for some reason because it does something really bad, lawyers are going to start reverse engineering what the decisions were within that project team and what the conflicts were and what the challenges were.
If you've got a really transparent way of recording, if I've raised the challenge five times and it's never been addressed and then it goes to court, then it's not me that's responsible for that. It's either the culture within the project management team or it's the culture within the business. So it's not a get out of jail free card by any means, but what it does do is it records exactly what happened and it gives you lessons for the future so that you can take those lessons and work on them and do better in the future.
Rachel: That's so incredibly necessary. Marie, just for one final question before I let you go, what are some of the newer challenges that your clients are facing? In the last nine months, we've seen the impact of large language models dwarf maybe everything that's happened in AI up until now. What are some of the more emergent threats that you see on the horizon?
Marie: There are huge human factors issues in terms of when you bring technology like Alexas and algorithms into your home, you can share data and not maybe know that you're sharing the data and the data can be shared across platforms.
The way in which Alexas and chatbots and things like chat GPT are means that they want you to make an emotional attachment with them, which can reduce your risk awareness and increase an attachment to a technology. This can then mean that you're more open to manipulation because it works on really basic human belief systems, really fundamental systems.
And this is becoming more apparent as we see new generations come up and they are immersed in this technology to the point where it might read them a bedtime story, it might be their teacher, it might be something that they talk to and they think it's equivalent to a real person when it's designed to have a conversation with you in a certain way.
And I think that that kind of human issue is going to be the big thing moving forward because it's the easiest way to get you not only maybe addicted to something like Twitter if it's telling you to click on things and making you angry and you're going back and clicking, but also to keep you in communities that maybe aren't serving you very well.
And they could be putting you in a certain mindset, they could be damaging you in some way. And it's very difficult to pick that up until it can kind of blow up. So we're having these issues around what does human interaction look like? What does human technology interaction look like?
How can we make sure that these things that are developed are not only behaving themselves in a way in which they're not sharing your data everywhere, but you've got an adequate risk awareness. And again, it comes back to education and people being aware of being able to critically analyze these technologies and say, how is that working?
What is that doing? How does it make decisions about me? Is it discriminating against me? Is it trying to make me do something? We're all kind of led towards doing things every day, such as watching a film, buying something. It can reach even higher levels when we're immersed in technology, such as things like Facebook or different platforms.
So I think the technical part of it is there are lots of flaws within that at the minute in which we need to look at best practice and fill them. But the human factors piece is becoming ever more apparent and we need to look at kids and look at how are they doing?
How can we keep them safe and not keep them safe by judging whatever free speech is and restricting them and everything, but making sure that they're aware of what they're doing and what that might be doing to them and keeping all the nice things in the world like play, interaction with other humans and not really getting too far into that technological world that we then need to pull ourselves really back out of it again.
Rachel: Yeah, the disinformation radicalization vortex turned up to 11. It's a terrifying vision of the future.
Marie, I'm so glad you're out there working on these things. It makes me hopeful. Thank you so much for coming on the show. It's been wonderful talking.
Marie: It's my pleasure. Thanks for having me.
Content from the Library
How It's Tested Ep. #13, The Evolution of Testing and QA with Katja Obring
In episode 13 of How It’s Tested, Eden is joined by QA expert Katja Obring. Together they discuss Katja’s 20-year career journey...
Generationship Ep. #20, Smells Like ML with Salma Mayorquin and Terry Rodriguez of Remyx AI
In episode 20 of Generationship, Rachel Chalmers is joined by Salma Mayorquin and Terry Rodriguez of Remyx AI. Together they...
O11ycast Ep. #73, AI’s Impact on Observability with Animesh Koratana of PlayerZero
In episode 73 of o11ycast, Jessica Kerr, Martin Thwaites, and Austin Parker speak with Animesh Koratana, founder and CEO of...