Article
Version 1
Preserved in Portico This version is not peer-reviewed
Harnessing Syntactic Feature for Code Representation Learning
Version 1
: Received: 19 December 2023 / Approved: 19 December 2023 / Online: 20 December 2023 (06:32:05 CET)
How to cite: Clevor, B.; Patel, R.; Slyder, W. Harnessing Syntactic Feature for Code Representation Learning. Preprints 2023, 2023121463. https://doi.org/10.20944/preprints202312.1463.v1 Clevor, B.; Patel, R.; Slyder, W. Harnessing Syntactic Feature for Code Representation Learning. Preprints 2023, 2023121463. https://doi.org/10.20944/preprints202312.1463.v1
Abstract
The paradigm of leveraging code as a dataset has recently gained traction, offering innovative solutions in domains such as automated commit message generation, pull request description automation, and program repair mechanisms. Consider the challenge in generating commit messages: traditional methods treat source code as a mere token sequence, applying neural machine translation models. This approach, however, overlooks the critical syntactic structures inherent in programming languages, which could offer deeper insights and improved accuracy. Building upon prior research, specifically the Code2Seq framework, which utilized Abstract Syntax Tree (AST) structural data for source code representation to automate method name generation, this paper extends and refines this concept. We introduce "CSR", a novel methodology adapted to represent code edits effectively. This paper investigates the impact of employing syntactic structure, focusing on the classification of code edits. Drawing inspiration from Code2Seq, "CSR" utilizes AST's structural properties, particularly the paths connecting leaf nodes, to enhance the task of code edit classification. This approach is rigorously evaluated on two distinct datasets, comprising fine-grained syntactic edits. Our comprehensive experiments reveal that incorporating syntactic structures does not significantly outperform simpler methodologies. While methods like Code2Seq and our proposed "CSR" show potential, our findings highlight that there is considerable scope for improvement and refinement before such techniques can be universally applied for learning representations of code edits. We anticipate that our findings will spark further research in this field, paving the way for more effective use of syntactic structures in code representation.
Keywords
code edit classification; abstract syntax tree; code structure representation
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment