Liming Zhu on AI Research & Process at Australia’s Data61

Liming Zhu on AI Research & Process at Australia’s Data61

Facebook
Twitter
LinkedIn

Dr. Liming Zhu is Research Director of the Software & Computational Systems group, in the Data61 business unit of Australia’s CSIRO. (CSRIO is the Commonwealth Scientific and Industrial Research Organisation—Australia’s national science agency.) Dr. Zhu is also Conjoint Full Professor School of Computer Science and Engineering at the University of New South Wales, Sydney, where he received a Ph. D. in 2006.

Among many other works, Dr. Zhu is the author, with Len Bass and Ingo Weber, of “DevOps: A Software Architect’s Perspective (SEI Series in Software Engineering)” published by Addison-Wesley Professional in May, 2015. He is also, with Mary Shaw, the author of the Point-Counterpoint article “Can Software Engineering Harness the Benefits of Advanced AI?”, published in the November/December 2022 issue of IEEE SOFTWARE.

Dr. Zhu was invited to serve as the General Chair of ICSSP 2023 because of his involvement with many rapidly developing areas of computer science, coupled with his extensive background in software engineering generally and software process in particular. The ICSSP Steering Committee is grateful to Dr. Zhu for agreeing to take on this role.

This interview with Dr. Zhu was conducted by Stanley Sutton via email during November, 2022.

Questions and Answers

SS (Q1): When I look on the Data61 website, I read that “Data61 is the dedicated data and digital arm of Australia’s national science agency”, with “one of the world’s largest artificial intelligence and data science research and development teams”, “creating Australia’s data-driven future”, and “solving Australia’s greatest data-driven challenges.” So I get the impression that Data61 is a “data forward” organization (which is certainly exciting). But I also see references to processes on many levels, from narrowly technical to broadly social. For example, “Collecting, analysing and storing data”, “ turn[ing] big, complex datasets into knowledge”, “train[ing] machine learning algorithms”, “predict[ing] how a bushfire might spread”, and “Reinvent[ing] the way science is done …”

Can you tell us something about the relationship of data and process in the Data61 organization, in conception and in practice? Are processes themselves a subject of research in particular projects?

LZ (A1): Data61 is a part of Australia’s national science agency. We help solve Australian business, community, and government’s biggest challenges via applied science in data and digital technologies. The challenges our customers face are rarely just about finding a smart algorithm to extract insights from some static datasets but about the business, system and software processes of dynamic data collection, continuous data governance, machine learning, deployment and monitoring. In particular, to make sure data/AI-driven decisions are trustworthy, we don’t just focus on the data and software artifacts themselves. We apply science to improve the processes that generate the data and artifacts and the complementary processes for assuring trustworthiness.

For example, we pioneered the responsible AI pattern catalogue work to identify reusable solutions in both processes (governance and process patterns) and data/code (product patterns) and explicitly connect them across multiple dimensions such as lifecycle, stakeholders/roles and system architecture. Within Data61, we apply the same process-data connections to make sure our scientific discovery process has traceability, reproducibility and integrity, which underpins everything.

SS (Q2): You and Mary Shaw authored a very interesting Point-Counterpoint article on the question “Can Software Engineering Harness the Benefits of Advanced AI?” (IEEE SOFTWARE, Nov.-Dec. 2022). To put it very briefly, her point was that the challenges posed by AI and ML (machine learning) can be addressed through the further development of known approaches from software and systems engineering. Your counterpoint was that AI and ML present some fundamentally new opportunities and challenges such that new ways of thinking and new methods are called for.

Based on your remarks in that article, and also looking at the Responsible AI Pattern Catalogue as a representation of that viewpoint, it seems that the harnessing of advanced AI is a societal problem for which software and systems engineering are necessary but not sufficient. Or, in other words, that AI will be harnessed only through the integrated application of ideas and methods from many different disciplines. Is that a fair statement, and could we say that Data61 is working toward an interdisciplinary system of methods or to address this issue?

LZ(A2): Yes, I think many challenges we face today require a multi-disciplinary approach. Engineering AI system particularly needs contributions from other disciplines for the reasons I outlined in the IEEE Software piece. For traditional systems, we can try our best to manage an interface between the problem domain and the system via well-elicited requirements and specifications. And software/system engineers solve the problem by designing and building a system that satisfies the requirements. For ethical/responsible software issues, we can also capture them via the requirement engineering process and with further support from a professional code of ethics.

However, AI solves problems for us by taking in some data (not proper requirements and specifications in the traditional sense) and providing us with a seemingly working solution but often inscrutable to us. AIs can continuously learn from field data and human feedback post-deployment and change themselves with potentially unforeseen consequences.

Facing such challenges, should we work with behaviour psychologists, who have studied humans/human brains as black-boxes for millenniums, to study black-box AI (the emerging field of “machine behaviour”)? Should we work with social scientists and ethicists to find more sophisticated ways, beyond requirements engineering, to deal with the wicked problems of ethical AI? Should we work with law professionals and regulators to find the best future-proof way of putting enforceable social-technical guardrails on increasingly autonomous AI? Should we work with business management and governance gurus to embed AI risk assessment in corporate risk governance at the board/executive level?

I think the answers are yes. Data61, being part of Australia’s national science agency, has access to many experts from other disciplines. We are working with them and the wider industry to tackle these challenges. And at its core, it is a “process” problem – the integrated process to ensure a trustworthy AI.

SS (Q3): You have advocated not only the development of new methods of software engineering but new methods of doing science. Why do you believe that we need new scientific methods, what would they entail, and how do you go about developing a new scientific method?

LZ (A3): When I say new science, I mean gaining new knowledge about AI systems. This new scientific knowledge is needed for devising better engineering methods to design and build AI systems and assure their trustworthiness. For example, if we accept that we cannot fully understand complex AI systems from within, we have to study AI systems from the outside, like behaviour scientists and social scientists studying humans and societies. And even understanding AI systems within is more akin to science than engineering. Fundamentally, we do not exactly build AI systems. We give AI systems data, and AI builds a seemingly working system that we need to study to understand and trust.

Another example would be neuroscience which is both inspiring and being informed by advances in AI. All these are still within the realm of the classical “scientific method” – proposing a hypothesis and doing experiments to confirm or refute the hypothesis. Proposing hypothesis has always been strictly in the human realm, even though we use sophisticated tools or even AI to help us design and do experiments on nature, humans, society or engineered systems. However, AI can also start to identify new hypotheses or unknown (to human) hypotheses that can already be confirmed/refuted by synthesising existing literature.

Certain areas of science can be accused of the “because-I-can” attitude without evaluating the human values and risks involved. AI relentlessly exploring the hypothesis space without guardrails can introduce new risks at another scale. So maybe the scientific method in the age of AI also needs to be revised.

SS (Q4): CSIRO has units that address an impressive array of vital concerns. Can you describe some of the interdisciplinary projects on which Data61 is working on with some of the other units, either involving AI or data in general?

LZ (A4): At CSIRO, we apply AI and associated responsible AI practices to the six national challenges we have identified, with the goals of both solving those challenges and seeding and transforming industries underpinned by deep science and innovation. For example, our AI scientists work with environmental scientists to develop AI solutions to analyse underwater images of the crown-of thorns-starfish for the Great Barrier Reef protection. We work with critical infrastructure experts (in Energy and Telecommunication) to use AI for better threat detection and operation efficiency. We also work with our indigenous communities to incorporate their valuable knowledge into land management via AI and drone-enabled solutions. All these projects require not just smart algorithms but inclusive and participatory processes for problem identification and co-design, where responsible AI practices are embedded end-to-end.

SS (Q5): I imagine that the project participants from different disciplines must all agree in principle to the advantages of an interdisciplinary approach, but do you encounter challenges in working together in practice? Perhaps relating to cultural differences between fields, communication gaps, or different modes of working? If so, then how are these resolved?

LZ (A5): Yes, a collaboration between disciplines is challenging. We try to solve this via a few approaches. First, we find having a very concrete and meaningful challenge with measurable outcomes helps motivate people. It’s better than more general collaborations. Second, we recognise there are very different types of collaborations. I won’t dive into the difference between multi-disciplinary, cross-disciplinary, interdisciplinary, trans-disciplinary and anti-disciplinary approaches. But they point to the very different roles scientists from one discipline play in collaboration with scientists from another discipline. Subsequently, the process of collaboration, including communications, incentives, resource allocation and expectations, will be different. For example, in our responsible AI research, software engineering researchers collaborate with social scientists, law professionals, management experts and AI experts. We structure the collaboration in such a way that they tackle a clear challenge, and everyone clearly sees the different types of values they generate.

SS (Q6):  In your experience with various types of collaborations with colleagues from other disciplines, are you able to draw on the Responsible AI Pattern Catalogue?  The patterns seem appropriately domain-independent and thus applicable for the working out of solutions in many problem areas, such as the national challenges targeted by CSIRO. But based on your answer to the previous question, I can also imagine that determining which patterns to use and how to instantiate them could itself be a challenge for teams with diverse participants. Do you have patterns fro accomplishing that?

LZ (A6): Many patterns in the Responsible AI pattern catalogue are about best practices in managing risks in technologies and innovation via an inclusive process. So they do help with multidisciplinary research collaboration and are applicable to multiple domains. AI does pose some unique challenges due to its black box and autonomous characteristics. But the complexity of human minds and society is certainly no less. About selecting the right patterns, we are indeed working on pattern selection methods and recommending connected patterns to offer an integrated set of risk mitigations.

SS (Q7): Based on our conversation, it seems that there is a growing domain of data processes (construed broadly) that can be seen as contributing to the lifecycle of software and systems—so might “Data Processes” make a good theme for a software and systems process conference?

LZ (A7): About Data Process, I think we often call them data flows, data pipelines or even data management/governance (process). To me, a process always handles data – taking input data and then produces output data. It’s just different types of data – all artifacts can be seen as data – code, design document, numeric data in database, ML training datasets..  So a Data Process may need to be clarified on the types of data to be more interesting.

Thank you, Dr. Liming Zhu

More to explore

This photo shows the skyline in Seoul, South Korea, the location of ICSP 2020.

Conference Summary: ICSSP 2020

Summary of the 1st ICSSP-ICGSE Joint Event Paolo Tell, David Raffo, LiGuo Huang, Igor Steinmacher, Ricardo Britto, Eray Tüzün, Paul Clarke Published

This photo shows a conference room at ICSSE 2019, where a session is being held.

Conference Summary: ICSSP 2019

Summary of the 2019 International Conference on Software and System Processes (ICSSP 2019) Stanley M. Sutton, Jr., Ove Armbrust, Regina Hebig, Paul

Thanks! We will process your request ASAP.