Making a difference with Oracle Academy
Mark Dras
The spotlight is on Mark Dras, Data Science Professor, Macquarie University, Australia.
Macquarie University was established in 1964 as a bold experiment in higher education, breaking from tradition to foster collaboration between students, academics, industry and society. Located 15km from Sydney’s central business district, Macquarie University is home to more than 44,000 students and 3,000 staff, and awards more than 10,000 degrees each year. The Wallumattagal Campus is set on 126 hectares of parkland, with its own hospital and metro station. More than 300 global companies have a presence on the campus or in the adjoining Macquarie Park high-tech precinct.
The Faculty of Science and Engineering is home to four schools and two research departments. In the School of Computing, Professor Mark Dras is the director of research students, and teaches classes in Algorithms and Data Structures; Data Science and Machine Learning; and Machine Learning.
Dras specializes in natural language processing (NLP) and computational linguistics, and directs the school’s Machine Learning Reading Group.
In 2021, Macquarie University became an Oracle Academy Institutional member and Dras began using the high-performance computing (HPC) resources of the Oracle Academy Cloud Program.
Oracle Academy: We understand that your students use Oracle Cloud Virtual Machines to validate research papers. Can you tell us about that?
Mark Dras: With pleasure. I teach a class focusing on specific applications of data science. This class is for masters-level students and I need to get them familiar with Artificial Intelligence (AI) projects. A key part of the class is to take what’s been learned previously—on machine learning, data science, and related areas—and apply it in a semester-long replication project.
This involves students working in small groups to select a piece of research work that comes with the open source code and data that make it reproducible. Their task is to replicate the paper, validate it on the original data, and then construct new data to see if the method really succeeds in what it claims to do.
Replication of code and data requires access to high-performance computing, and for this I use the Virtual Machines available from Oracle Cloud.
Oracle Academy: Why did you select Oracle Cloud?
Mark Dras: Before Oracle Academy, I used other HPC providers. But there were limitations both on functionality and also classroom support. I needed an environment where you could set up Virtual Machines to handle any programming language, any kind of programming libraries, or any of the frameworks for doing AI work. With Oracle, I found the ideal platform, where you can spin up Virtual Machines that only exist in the cloud, run arbitrary code and any kind of deep learning framework to carry out experiments in the AI space. So, in the second half of 2022, I obtained accounts from Oracle Academy and we have been using Oracle Cloud ever since.
One of the neat things about the Oracle Cloud Free Tier offering is that, within the size constraints, you can spin up multiple machines that let you try various things in parallel. In that context, I teach a class on how to scale projects in line with available resources. The experience has been very good. Student access to Oracle Cloud is through a clear interface, and Oracle Academy was extremely helpful in the process of setting up classroom accounts, with useful instructional videos and other materials.
Oracle Academy: Can you give us examples of the research papers replicated by your students?
Mark Dras: Certainly. Last semester our class worked on about a dozen. Let me mention three of them. One group selected a research paper on a new method for object detection. They took the open source code from an associated repository, got it working in an Oracle Cloud Virtual Machine, checked that they could get the same results as in the paper using the original datasets, and then built a new object detection dataset to apply the method to.
Another group chose a paper focused on computer vision and pattern recognition, specifically on Human Pose Estimation, a way of identifying and classifying the joints in the human body from an image. As with the object detection example, students took the code made available with the paper, applied it to a standard set of images with human poses marked on them, and then evaluated the findings.
As a third example, students reproduced an item recommendation system similar to Netflix: ‘You watched X so you will probably like Y.’ The scholarly term for this technology is Generative and Discriminative Information Retrieval Model. Once again, students replicated the code and validated the research.
It's a one semester project and we have to work fast. One of the challenges I gave them was to find an appropriate piece of work within a certain deadline. I have a default project up my sleeve in case a student takes a long time identifying a piece of work or it turns out to be too complicated.
Oracle Academy: Fascinating. Quite different from the old days of validating research.
Mark Dras:That’s right, in its digital incarnation. How research communities work continually evolves and different communities have their own practices. Back before code repositories like GitHub, one used to check validity of work through peer review. It’s the same concept—testing whether an idea is true or not by following the steps described in the paper and coming to the same conclusion.
And because it’s a very old idea, in the first week of my class I give them a whirlwind tour of the history of philosophy of science. I mention how the English scientist Michael Faraday (1791-1867) discovered principles of electromagnetism and, to demonstrate that his idea was valid, he packaged up an example of magnetism, a nail and some magnetite, and sent it to other scientists in Europe for review.
In the current trend of reproducible science, scientific journals and conferences are increasingly expecting that the data and code behind the research be made available, and researchers are increasingly complying.
In a similar way, I encourage my students to insert in their CVs links to the code repository of projects they have worked on in my class. It’s an opportunity to display what they have done; concrete evidence rather than just a project title.
Oracle Academy: Is the class specifically for students who will become researchers?
Mark Dras: Not necessarily. It’s for anyone who wants to get to grips with AI and data science in general. Certainly, it helps prepare students for conducting PhD-level research, and yet the experience also can be used in business; for example in evaluating open source resources as the possible basis for solutions to industry problems.
Oracle Academy: We see that you also teach a class in Machine Learning (ML). Does that also draw on Oracle Academy resources?
Mark Dras: That’s something we are looking into in the context of a long-range teaching strategy involving all areas of computer science. Like many universities, we’re changing the structure of our classes fairly dramatically to cope with all the rapid developments in the ML field. There’s big demand and we are looking at what the best tools might be. We will be splitting the existing ML unit into three classes to focus on natural language processing (NLP), computer vision, and robotics.
In regard to Oracle Academy, there are a lot of resources to be explored. We haven’t yet drilled down into all the components or pre-existing materials. At master’s level there’s a bit of a tradeoff. On the one hand Oracle Academy provides tools and interfaces that are great because things are nicely packaged up and easy to use. But at the same time, it’s valuable for students to know the nitty gritty of working with a Command Line Interface—which of course can be directly through the Oracle Cloud VMs.
And so, this summer we have a retreat for discussing many teaching topics. I was the first to use Oracle Academy Cloud for HPC coursework and will be representing the data science discipline. Among a myriad of topics and resources, we will discuss the use of Oracle Academy resources in other Faculty of Science and Engineering departments. My focus will be on coming up with a common way to use HPC and/or Virtual Machines. Oracle will have a key role to play in that, given that we have signed a university-wide agreement and that I’ve already tried it out and find it excellent.
Oracle Academy: Sounds like a vibrant retreat is in the works. And what are your interests outside of the University?
Mark Dras: Let me first share a Machine Language blunder in what they are not! A colleague recently ran me through ChatGPT and, though it was right about my NLP work, it came up with ‘an avid hiker who likes camping and fishing with my friends.’ That’s pure hallucination!
But putting that aside, I have a great interest in history, philosophy and mathematical/theoretical computer science.
I also enjoy mangas and anime, and am currently going through the Vinland Saga with my kids, delving into a revenge plot around King Cnut, in the Dane-controlled England of the 11th century.
I play video games and also read a lot: most things by Ursula Le Guin, J.R.R. Tolkien, A.S. Byatt, and many others. Exercise-wise my sports are squash and badminton.
Thank you, Mark Dras, for your passion for Oracle Academy and for preparing your students to make a positive impact.