Please ensure Javascript is enabled for purposes of website accessibility

Open-source in generative AI holds a promise for universities in developing countries

Generative AI is a widely used term thanks now to  ChatGPT, which is based on a Large Language Model  (LLM) called Generative Pretrained Transformer.  In general, the use of LLMs in education can be  transformative. For instance, they can be used to create  intelligent tutoring systems capable of providing  personalised learning experiences to students. These  systems can answer students’ questions, provide  explanations and even generate practice problems. LLMs  can be used to translate OER into local languages, making  education more accessible.

Educators have noted some challenges when LLMs are  used extensively in teaching. Concerns include the accuracy  of outputs, implicit and explicit biases, and cultural  appropriateness of the outputs. While some of the explicit  biases can be addressed, there is no clarity on the removal  of implicit biases.

A major concern is privacy. It has been shown that an  adversary can extract/reconstruct the exact training  samples from the LLMs, which can lead to the revelation of  personally identifiable information. Ethical concerns in AI  include how the training data for an LLM was acquired. It  should be a concern for educators that use the models.

Recent models of GPT are commercial and cannot  be repurposed. Open-source LLMs have emerged as  promising tools, especially for developing countries. These  models, pre-trained on vast amounts of data, can be finetuned  to perform various tasks, from language translation  to answering complex questions, making them a versatile  asset in the educational sector. Like open-source software,  they can offer a wide range of services in education with  a lower cost of ownership. They can be downloaded and  hosted locally. Some can be run using consumer-grade  computers in an institutional network.

Several open-source LLMs are available today, and the  number is growing. Bloom is the largest LLM available in  the open-source domain with about 176 billion parameters.  It can generate outputs in 46 human languages and 13  programming languages. Four different models in the  family of Large Language Model Meta AI (LLaMA),  owned by Meta (parent of Facebook), have been made  available to the public — the largest having 65 billion  parameters, pre-trained with quality data. LLaMA2 is  a fully open LLM.

Pre-training LLMs is a resource-intensive process, often  requiring significant computational power and financial  investment. However, when in the open domain, these  models allow third parties, researchers and practitioners to  fine-tune them in using their institutional or private data to  accomplish their AI tasks. This approach reduces the cost  of leveraging LLMs, making them accessible to institutions  with limited resources, such as universities in developing  countries. The process of fine-tuning can also help address  some of the concerns about accuracy or explicit biases.

Among the new open-source LLMs, Vicuna 13B is gaining  popularity. It is a fine-tuned version of LLaMA, and some  claim that it is comparable in performance to GPT-4 and  Google Bard. The cost of fine-tuning Vicuna 13B was about  USD 300.

A significant development is the release of Falcon 40B by  the Institute of Innovation in the United Arab Emirates. It  is a pre-trained model with high-quality data of about 750  billion words or about a trillion tokens. As a foundational  LLM, it can be fine-tuned for any task. A lower version with  seven billion parameters is also available, which can be  run at a reasonable cost. Falcon 40B is an example of how a  nationally co-ordinated effort can invest in creating its own  high-quality LLM for unrestricted use.

The Russell Group of universities in the UK recently  published a statement of principles for using generative AI  in education. The principles include promoting AI literacy,  building staff capacity, adapting AI tools and systems and  maintaining academics while sharing best practices, rigour  and integrity.

Open-source LLMs can help universities adhere to these  principles and ethics of AI. They can also be offlined  and used in universities, which can fine-tune them for  performing relevant AI tasks.


Icon for the Creative Commons Attribution-ShareAlike 4.0 International License

Connections (vol. 28, no. 2) Copyright © 2023 by Commonwealth of Learning is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book