Τίτλος:
CrossVul: A Cross-Language Vulnerability Dataset with Commit Data
Γλώσσες Τεκμηρίου:
Αγγλικά
Περίληψη:
Examining the characteristics of software vulnerabilities and the code
that contains them can lead to the development of more secure software.
We present a dataset (similar to 1.4 GB) containing vulnerable source
code files together with the corresponding, patched versions. Contrary
to other existing vulnerability datasets, ours includes vulnerable files
written in more than 40 programming languages. Each file is associated
to (1) a Common Vulnerability Exposures identifier (CVE ID) and (2) the
repository it came from. Further, our dataset can be the basis for
machine learning applications that identify defects, as we show in
specific examples. We also present a supporting dataset that contains
commit messages derived from Git commits that serve as security patches.
This dataset can be used to train ML models that in turn, can be used to
detect security patch commits as we highlight in a specific use case.
Συγγραφείς:
Nikitopoulos, Georgios
Dritsa, Konstantina
Louridas, Panos and
Mitropoulos, Dimitris
Εκδότης:
ASSOCIATION FOR COMPUTING MACHINERY
Τίτλος συνεδρίου:
PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE
ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE
ENGINEERING (ESEC/FSE `21)
Λέξεις-κλειδιά:
Dataset; vulnerabilities; security patches; commit messages
DOI:
10.1145/3468264.3473122