SHF: Small: Preponderance of the Evidence for Behavioral Code Similarities

Kaiser, Gail (PI)

Columbia University

Proyecto

Description

Code clones are often produced when a software engineer reuses existing software and then tailors it to a new context. A large system may contain a lot of clones that were created from reused software or else from software maintenance where code was modified to fix bugs or adapt to changing requirements. The ability to detect code clones is fundamental to software development, maintenance and evolution. The ability to analyze software retrospectively to find same or similar code is very important to software quality and productivity. Clone detection has been an active area of research for many years. However, clones were considered same/similar if they had same/similar syntax. This project breaks new ground by detecting semantically similar clones, which may be consider same/similar even if they look different, i.e. have different syntax. To be semantically equivalent, clones need to have the same behaviors, which is to say they produce the same execution paths from the same inputs.

This project investigates dynamic analysis approaches to identifying behavioral similarities among code elements in the same or different programs, particularly for code that behaves similarly during execution but does not look similar so would be difficult or impossible to detect using static analysis (code clones). While code clone technology is fairly mature, tools for detecting behavioral similarities are relatively primitive. The primary objective is to improve and shape behavioral similarity analysis for practical use cases, concentrating on finding similar code in the same or other codebases that might help developers understand, debug, and add features to unfamiliar code they are tasked to work with. The project seeks to advance knowledge about what it means for code to be behaviorally similar, how dynamic analyses can identify behavioral code similarities, how to drive the executions necessary for these analyses, and how to leverage code whose behavior is reported as highly similar to the code at hand to achieve common software engineering tasks that may be ill-suited to representational code similarities (code clones). The research investigates the utility and scalability of dynamic analyses seeking behavioral similarities in corresponding representations of code executions; guiding input case generation techniques to produce test executions useful for comparing/contrasting code behaviors for particular use cases; and filtering and weighting schemes for adapting the preponderance of the evidence metaphor to choosing the most convincing similarities for the software engineering task.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Estado	Finalizado
Fecha de inicio/Fecha fin	10/1/16 → 9/30/22

Financiación

National Science Foundation: $496,571.00

Keywords

Software
Redes de ordenadores y comunicaciones
Ingeniería eléctrica y electrónica
Comunicación
General

SHF: Small: Preponderance of the Evidence for Behavioral Code Similarities

Detalles del proyecto

Description

Financiación

Keywords

Huella digital