John Musgrave, Alina Campan, Temesguen Messay-Kebede and David Kapp
Adv. Artif. Intell. Mach. Learn., 4 (1):1959-1976
John Musgrave : University of Cincinnati
Alina Campan : Northern Kentucky University
Temesguen Messay-Kebede : Air Force Research Lab, Wright-Patterson Air Force Base
David Kapp : Air Force Research Lab, Wright-Patterson Air Force Base
DOI: https://dx.doi.org/10.54364/AAIML.2024.41112
Article History: Received on: 20-Jan-24, Accepted on: 07-Feb-24, Published on: 20-Feb-24
Corresponding Author: John Musgrave
Email: musgrajw@mail.uc.edu
Citation: John Musgrave, Alina Campan, Temesguen Messay-Kebede, David Kapp, Boyang Wang (2024). Empirical Network Structure of Malicious Programs. Adv. Artif. Intell. Mach. Learn., 4 (1 ):1959-1976
A modern binary executable is a composition of various types of networks. Control flow graphs are a commonly used representation of an executable program used for classification tasks. Control flow and term frequency representations are widely adopted, but provide only a partial view of program semantics and present challenges to increases in resolution. By performing a quantitative analysis of program networks, we enable the identification of patterns within these features that are correlated to structure. This allows for increases in feature resolution and pattern recognition in classification tasks. These are necessary steps in order to obtain greater explainability in classification results. We demonstrate the presence of Scale-Free properties of network structure for program data dependency and control flow graphs, and show that data dependency graphs also have Small-World structural properties. We show that program data dependency graphs have a degree correlation that is structurally disassortative, and that control flow graphs have a neutral degree assortativity, indicating the use of random graphs to model the structural properties of program control flow graphs would show increased accuracy. An increase in feature resolution allows for the structural properties of program classes to be analyzed for patterns as well as their component parts. By providing an increase in feature resolution within labeled datasets of executable programs we provide a quantitative basis to interpret the results of classifiers trained on CFG graph features. By capturing a complete picture of program networks we can enable future work in mapping a program's operational semantics to its structure.