The use of Māori knowledge in AI: How does it affect indigenous data sovereignty and human rights?

This briefing provides an overview of the importance of data sovereignty, and explains why the risk of data colonialism is increasing with specific reference to the Māori in New Zealand. This piece was written by Kandy Zhang and edited by Tanvi Sureka.

Indigenous Māori organisation, Te Hiku Media, created a speech recognition Artificial Intelligence (AI) tool for the Māori community to help revitalize their language: Te Reo Māori. This effort has sparked anxieties over data sovereignty, as large tech corporations seek access to indigenous linguistic and cultural data. Te Hiku Media is trying to fend off large tech companies from using indigenous knowledge and data. The fight for indigenous data control reflects broader global power dynamics where marginalised communities are facing data and cultural exploitation by technology companies in an AI-dominated world. Māori’s collective resistance signifies a rising global movement to protect indigenous data sovereignty.

Why is data sovereignty important to the Māori Community?

Governments worldwide have signalled their intent to encourage the implementation of AI technologies across different industries, from education, technology to art and linguistics. However, the increased prevalence of data collection for AI machine learning has raised concerns regarding data sovereignty and security, particularly among indigenous and marginalised communities.

Māori, the largest indigenous community in New Zealand have expressed anxieties regarding the collection of Māori cultural and linguistic data for the purpose of AI. Te Reo Māori is the only language spoken in New Zealand prior to European settlements, making it intrinsically tied to the indigenous identity. British colonisation in the 19th century led to the spread of English as a dominant language, which threatened the prevalence of Te Reo Māori in New Zealand. In recent years, speech recognition artificial intelligence (AI) was developed by an indigenous Māori organisation: Te Hiku Media, led by Peter-Lucas Jones to assist with the revitalisation of the Māori language. Te Hiku Media was first established in the late 1980s as a tribal radio station and since then, the organisation has evolved to adopt Natural Language Processing (NPL) tools to train AI models for Māori speakers. The organisation is also a leading advocate for Māori cultural and data sovereignty as global technology companies such as OpenAI attempts to retain and use Māori data for their ‘Whisper’ software’s machine learning processes to train a bilingual Te Reo Māori and English model.

The implementation of AI for indigenous languages can be important for their survival in a digital society, however, whilst such technologies can be beneficial, allowing large tech companies to access their data risks cultural exploitation, violation of human rights, and loss of data sovereignty. For example, Whisper’s ability to transcribe te reo Māori has raised concerns about where the AI software got the data from. It was identified that Whisper’s model has been trained with 1381 hours of Te Reo Māori, however, where the data has been derived from, who gave them the access and who gave them the right to create work from the data was not stated or mentioned in OpenAI’s report and machine learning processes. This poses important concerns regarding data sovereignty and whether the Māori community consented to providing their linguistic data to OpenAI. Data sovereignty is crucial as it protects indigenous knowledge and rights as they hold deep historical and traditional significance. This aligns with international human rights frameworks such as the United Nations Declaration on the Rights of Indigenous people (UNDRIP), which emphasises the importance of indigenous data sovereignty. The loss of data sovereignty means the Māori culture is at risk of being stolen, exploited, and commodified by global tech corporations in this increasingly AI-dependent world.

What is happening now and why is the risk of data colonialism increasing?

Te Hiku’s attempt to fend off large tech companies can be seen as Māori people’s final frontier against colonialism. This is not a traditional form of colonialism; it is data colonialism. Data of the Māori people are being harvested and used in AI systems designed by the large corporations. Despite the positive economic values AI innovation can bring highlighted by the 2018 AI Forum of New Zealand, the marginalised cultural groups may not experience these positive outcomes equally. The AI industry is designed to serve the existing structures of power and is influenced by wider forces of political and economic factors beyond the Māori people’s control. Allowing non-indigenous corporations to access Māori data, thus becoming the AI Māori language generator, risks reinforcing biases towards indigenous culture. AI rules and machine learning programs are created in favour of the corporation’s perspectives of what language knowledge should be prioritised, and how the indigenous cultural world operates. This therefore risks leading to data colonialism as the Māori people will face the inequality and discrimination manifested by the AI language processors. In addition, this may reinforce the unequal global dynamics in knowledge and technology innovation, where the Global South and indigenous communities provide data resources and remain dependent on external entities. The Māori community’s battle over indigenous data is not limited to a local scale; it mirrors global tensions between powerful technology-driven nations such as the U.S. and China, along with smaller states and marginalised groups that seek to protect their digital assets and assert data sovereignty.

The push for the implementation of AI across all industries overlooks indigenous human rights and consciousness, and Māori’s fight against global tech corporations from accessing their data reflects a larger global movement of indigenous rights. The resulting social and political impact of this event is the creation of a space for the Māori community to be united. The collective resistance against large tech corporations strengthened the Māori people’s identity and shared vision of self-determinism. The recognition of shared goals fuelled the establishment of indigenous networks and advocacy platforms that promoted indigenous resilience. For example, the Māori Data Sovereignty Network (MDSov) was established in 2015 to advocate for individual privacy and consent of the use of AI technology. A range of government initiatives have been developed in response to the Māori Data Sovereignty Network’s demands. These initiatives aim to protect indigenous sovereignty. However, the ongoing treaty principles bill in New Zealand which seeks to reinterpret the principles of the Treaty of Waitangi, which have historically guided Settler-Māori relations and legislations could threaten Māori rights and reversing gains in areas such as cultural recognition and language revitalization. This implies that the fight between Māori and AI companies is an ongoing battle that extends beyond data sovereignty into broader struggles of indigenous identity and self-determination. It is a battle of cultural autonomy in an era where digital resources and AI hold significant power in the global political economy.

Further considerations

Current regulatory approaches fail to address the privacy implications of spreading AI technology to Indigenous communities. Legislations are important as we live in an information society, where laws are crucial to ensure responsible use of AI. The existing economic and political realities of the AI industry where Western tech giants such as OpenAI and Microsoft dominate the market highlights the increasing consolidation of power in the AI industry, ultimately leading to AI capitalism. When such industry is dominated by few powerful players, they have the ability to determine technological standards, which reinforces data colonialism. Despite Te Hiku Media’s efforts, the Māori language have been obtained by big tech company— an automatic speech recognition system (whisper) has been introduced in 2023.There are no legal obligations for companies to compensate communities for the data they collect which violates Māori people’s sovereignty and self-determination. This informs us the limitations of present legislative interventions in protecting and advocating for indigenous rights in the AI industry. This demonstrates the urgency to introduce appropriate legislations and effective policies to protect the rights of vulnerable communities such as the Māori community in New Zealand.

It is important to acknowledge the geopolitical power-relationships and contemporary processes of neo-colonialism in understanding Indigenous communities' attempt to fence off global AI companies from harvesting their language, knowledge, and culture. The loss of data sovereignty means the Māori culture is at risk of being stolen, exploited, and commodified by global tech corporations in this increasingly AI-dependent world. The outcome of this struggle between Māori and global tech companies will not only define the future of Māori cultural and linguistic autonomy, but also influence how indigenous data is treated on a global scale.

The use of Māori knowledge in AI: How does it affect indigenous data sovereignty and human rights?

Recent Posts

Comments