Harnessing AI in education with multimodality
The role of the teacher in harnessing AI in education and training is an emerging topic. This is particularly important with the wider use of generative AI services, of which ChatGPT is the prime example.
A number of universities have issued guidance notes on the use of generative AI (GenAI) services. The guidance provided by Waterloo University, in Canada, to a wide range of stakeholders is an exemplar. It covers essential topics, including the use of ChatGPT in teaching, the risks of cheating and the need to verify statements generated in a conversation.
Besides offering guidance, the University of Michigan (UM) has launched its own AI services; one of them offers GenAI models like ChatGPT for the university community, while another enables users to query their own datasets. The UM GPT Toolkit is a platform optimised for advanced users to construct, train and host AI models securely and at scale.
University-wide deployment of AI services can be quite expensive because it requires significant computing resources and costly personnel. However, Andrew Ng, an AI pioneer, offers a different view. He believes models that use 100 billion-plus parameters are not required to do most tasks. GPT 3.5, the model that runs the unpaid version of ChatGPT, has about 175 billion parameters. But much smaller models, with one to ten billion parameters, exist to do specific tasks and can run from a laptop computer. And there is a good possibility that more such models will be available sooner rather than later.
In fact, at the Commonwealth of Learning, we have installed and run the open-source Large Language Model called LLaMa in two different versions using a retired desktop computer. One version has 13 billion parameters, while the other provides 70 billion. A user can have meaningful conversations with either of the models.
Thus, it is realistically feasible to make such a computer available in a department so that teachers and students can use it co-operatively and create good practices for applying GenAI in education.
A new frontier in artificial intelligence is emerging, called “multimodality.” This refers to AI systems that can process and generate content across multiple modes, including text, images, audio and video. Researchers are now developing AI models that can translate between modalities – for example, generating images from text descriptions or synthesising voices based on images.
GPT-Vision, an extension of the paid version of ChatGPT, possesses the capability to interpret visual data such as hand-drawn sketches. For instance, when provided with a hand-drawn sketch of a cylinder with marked dimensions, GPT-Vision can accurately decipher the dimensions and subsequently compute the volume of the cylinder by applying the formula for volume, showcasing a blend of visual understanding and computational proficiency.
Recently, the Bing Chat AI-powered assistant from Microsoft has been provided with the ability to generate good-quality images from text prompts. The implications are profound. Multimodal AI could enhance accessibility, allowing those with disabilities to better interact with content. It also raises concerns about misuse to create misinformation or inappropriate content.
University leaders will need an awareness of multimodality to set policies on AI ethics and to develop curricula. Expertise in multimodal machine learning may become crucial for computer science programmes. We must also consider how these technologies could improve or disrupt teaching and learning. With careful governance, AI’s new expressive powers could make education more creative and inclusive. However, we have to ensure users are guided by ethical principles.