Read article16 min
Written by Luke Noothout and Philo van Kemenade
A design approach towards participatory machine learning
How can we help laypeople understand the complex trade-offs involved in developing and deploying machine learning systems? As AI is becoming increasingly important in public systems, we teamed up with researchers from TU Delft to develop a tool that enables citizens to participate in the discourse around civic machine learning systems.

Background

Machine learning technology has become widely adopted by municipalities and governments. But while civic AI is shaping public systems, it remains invisible to citizens.

Together with Kars Alfrink (TU Delft), who conceptualized this project as part of the Human Values for Smarter Cities research project led by the Civic Interaction Design group at the Amsterdam University of Applied Sciences, we explored a possible solution that informs citizens and enables them to participate in governing civic AI.

The biased systems that shape our everyday lives

When we think of machine learning algorithms, we often picture our social media feeds and personalized music recommendations. However, the influence of this kind of algorithms is much wider than just our online experiences. They also shape how we live together in public spaces. Machine learning (ML) is a powerful tool to optimize and automate a wide range of problems, from the way transportation is planned to how waste collection is managed. But the adoption of ML in public spaces (often referred to as “civic AI”) comes with some tension.

Most ML algorithms we interact with online are meant to be tailored to our preferences as individuals. Because of this, we have control over them, albeit in a limited and implicit way: we give feedback through the way we use a product, including both explicit (thumbs up/down) and implicit (the content we interact with) feedback. In contrast, the experiences shaped by civic AI are shared among large groups of citizens. This makes it even harder to directly engage with such systems, which were already opaque to begin with.

The way AI systems operate is controlled by the people who deploy them. This might be fine the context of commercial products and services, where (in most cases) users can choose whether or not to use the product. It becomes problematic when AI is used in public spaces, where there is no opting out. In these cases, the deployment and governance of AI systems is part of a democratic institution, and thus citizens are entitled to accountability and should be able to contest decisions.

Scrutiny of civic AI systems can reveal critical issues, as shown by a recent investigation done by Lighthouse into welfare fraud detection algorithms. Their study, published in partnership with Trouw and MIT Technology Review, found that ”these systems discriminate against vulnerable groups with oftentimes steep consequences for people’s lives”. The article goes on to argue that even when municipalities try to build “fair” AI systems (like Amsterdam’s welfare fraud detection) complex technical trade-offs create unavoidable biases. Despite these persistent challenges, municipalities and governments continue to adopt civic AI models, making it all the more vital to examine their potential biases and consequences.

Understanding complexity is a design problem

Because perfect fairness is a technical impossibility, trade-offs need to be made. Therefore these systems cannot remain invisible for the people whose lives they affect. Besides making AI systems transparent and accountable, it is crucial to allow citizens to have a say in how they are used. So how do we achieve this? This is not just a technical or political issue, but also very much a design challenge.

When it comes to civic AI we are not simply talking about technical decisions, but about value judgments: what trade-offs can we live with as citizens? Making these kinds of decisions together requires an understanding of the underlying systems and their trade-offs. This begs the question: how can we design for such meaningful participation?

We recently had the opportunity to explore this design challenge of making civic AI systems transparent and accountable. We were approached by Dr. Kars Alfrink, a postdoctoral researcher at TU Delft, to design and develop a prototype visualization that would do just this.

Kars is researching what he calls “Participatory Machine Learning” — a field that asks how regular citizens can be more actively involved in the decision-making around machine learning systems in public spaces. His work recognizes that these systems don't just have technical parameters; they embody political choices about whose needs are prioritized and which trade-offs society is willing to accept. As recent investigations have shown, even well-intentioned attempts to build “fair” AI face unavoidable mathematical constraints that require democratic input to resolve.

Interactive data visualizations as a bridge

This is where interactive data visualization can play a crucial role. Rather than leaving complex trade-offs solely in the hands of technicians and policymakers, visualization can make the consequences of different technical choices experiential and understandable. When citizens can see and manipulate the parameters of an AI system, they can participate meaningfully in conversations about how these systems should operate in their communities.

For our collaboration with Kars, we focused on a speculative use case for Amsterdam’s municipal scan car system: a fleet of vehicles equipped with computer vision that patrols the city’s streets, detecting parking violations. In theory, these cars could also be applied for different use cases such as detecting trash that is left on the streets. While less politically charged than welfare fraud detection, this system presents the same fundamental challenge: how do you balance catching genuine violations against avoiding false positives that burden innocent citizens?

The Vision Model Macroscope: a design sprint approach

Working within a tight nine-day timeframe, we used a version of Google Ventures’ design sprint methodology to rapidly ideate, prototype, and test a solution. Our goal was not to build a production-ready system, but rather to create a functional research prototype that could facilitate meaningful discussions between citizens, municipal workers, and AI engineers.

The prototype was designed specifically for focus group settings: deliberative sessions where participants could gather around a tablet and collectively explore how machine learning trade-offs play out in practice. These conversations would be guided by a facilitator, ensuring that technical insights could be translated into actionable policy discussions.

For our use case, we focused on Amsterdam’s speculative trash detection system as described above. Scan cars equipped with computer vision algorithms patrol the streets, automatically flagging instances where waste has been deposited incorrectly. When the system detects misplaced trash, it reports the location to municipal workers who must first verify the detection before dispatching cleanup crews. This seemingly straightforward process involves the same precision-versus-recall trade-offs that plague more sensitive AI applications: set the detection threshold too low and workers waste time investigating false alarms; set it too high and genuine problems go unaddressed, degrading neighborhood quality of life.

We called our prototype the “Vision Model Macroscope” — a tool that would help participants see how individual algorithmic decisions aggregate into broader patterns of consequences across the city.

Making machine learning trade-offs experiential

As mentioned earlier, the use case for the VMM is Amsterdam’s automatic trash detection by scan cars. The data the VMM works with are detections of trash reported by the scan car. In the VMM citizens have access to what is often called “ground truth”: the correct answer to whether or not a detection is actually trash is already known. The VMM is designed to show how a single trade-off affects the performance of the system across different levels and perspectives.

Confidence limit

The trade-off in question is the “confidence limit”. Whenever the computer vision algorithm suspects it has detected trash, it will ascribe a confidence score to that detection. This score will always be between 0 and 1, where 0 means “absolutely not confident” and 1 means “absolutely confident”. Note that this score doesn’t say anything about right or wrong. A detection with a confidence of 0.01 might turn out to be indeed trash, while a detection with a confidence of 0.99 could very well still be wrong.

The trade-off that needs to be made is this: from what confidence limit should the algorithm treat a detection as true? If this limit is too low, it will report all detections, even with very low confidence scores, resulting in many wrong detections. If the limit is set to high, no detections will be reported at all. The main interaction of the VMM is the confidence limit slider that affects which detections will be reported. The rest of the VMM provides different views on how these reports affect the performance of the overall system.

Detection of trash

Next to the confidence limit, all trash detections are plotted in such a way that their Y-position corresponds with their confidence score. Because we are working with ground truth, we know whether a detection was correctly or incorrectly labeled as trash. This information is represented by the “correct” and “wrong” buckets which are blue and purple, respectively. Setting the confidence score limit visualizes the trade-off: chances are there will always be wrong detections that are being reported and correct detections that will go unreported.

Summary

The summary describes the performance of the system on the most abstract level. Both through natural language and a Sankey diagram, it communicates how detections end up distributed across the review process, and what the resulting workload will be for municipality workers tasked with reviewing these.

Maps

The maps provide more context about what the current confidence score limit means in practice. It compares the impact of the current confidence limit score across three neighborhoods in Amsterdam and shows that the consequences might not be distributed equally. “Percentage of reported trash detections that are correct” visualizes how certain the reports of trash in a neighborhood are correct. “Percentage of correct detections that are reported” visualizes how certain it is that trash in a neighborhood will be reported.

Individual detections and scan car footage

The bottom two panels provide very concrete context for what we are talking about: trash in the city. It shows the scan car footage, and the trash the system detected within it. Through color it also informs whether these detections are correct and wrong, and whether they are reported or not. All of this combined gives an idea of how the system performs on the level of individual inferences.

Interaction and use

As a whole, the VMM is designed to allow citizens to explore how the system as a whole works and play with the optimization process. The consequences of changing the confidence limit are made visible across a variety of scales, and trade-offs become tangible: do we care more about having all detections being right, or detecting all the trash? Do we optimize for equality across neighborhoods, or the workload for municipality systems? Different stakeholders will likely have different opinions based on where they live or what their role in the system is, and coming to a shared decision requires dialogue and reflection, which tools like the VMM can facilitate.

Design principles for civic AI engagement

At the moment of writing this article, Kars is using the VMM in a field study. While working on this project we had the opportunity to test a prototype with a select number of people and reflect on the VMM together.

A key insight to take into account when designing for civic AI engagement is this: trade-offs are inevitable. AI engineers will always need to make a decision between precision and recall. Do we care more about making sure that the system is always right, or do we want to make sure that as many subtle edge-cases are included as possible? The decision that is fair for one group will be unfair for another.

This illustrates the value of deliberation. Making a decision like this requires that all stakeholders are around the table, on the same page, and in understanding of each other’s perspectives. Visualizing the effects of a trade-off play an important role in experiencing why a personal preference might result in an undesirable disadvantage to others.

This process of deliberation requires both a macro- and micro-perspective on civic AI. High-level (macro) aggregate statistics on precision and recall illustrate the overall performance of the system. Low-level (micro) inferences provide concrete examples of how the system would perform in specific situations. Visualizations such as maps and Sankey diagrams help to connect these perspectives and drive an informed discussion about the trade-offs to be made.

And last but not least: the VMM is an experience. It provides direct manipulation of the system by letting people experiment with different confidence limits. This enables them to get a better grasp of the system’s mechanics, and builds intuition for the complexity behind the trade-offs.

Machine learning as a public concern through design

The Vision Model Macroscope demonstrates that making machine learning systems truly public isn’t just a matter of policy — it requires thoughtful design that bridges algorithmic complexity and democratic participation.

While we focused on trash detection, the same dynamics play out across different domains of municipal decision-making: energy grid management, traffic optimization, housing allocation. Any algorithmic system that embeds value judgments about fairness and acceptable trade-offs shouldn’t be left solely to engineers and administrators.

Participatory approaches can be relevant at multiple scales: from local policy decisions to national initiatives like the Netherlands’ AI Plan and even continental frameworks like the AI Act by the European Union. Beyond civic domains, they also translate to organizational contexts, where companies increasingly rely on ML for hiring, performance evaluation, and strategic planning. The question remains: how do we work with AI in ways that align with our values?

Read, see, play next
read3 min

Meetup recap: building trustworthy AI

read6 min

Quieting the digital noise

How design can create a calmer world