Project Details
Description
A promising direction for cybersecurity is to use machine learning to detect threats and attacks. For instance, machine learning is currently used to detect computer viruses, malware, malicious mobile applications, spam email, and network intrusions. However, one fundamental challenge for using machine learning in this way is the problem of concept drift. Concept drift refers to the problem that threats change over time, and normal benign behavior changes over time, and as a result, machine learning algorithms rapidly degrade and become less effective as time passes. Empirically, concept drift is one of the main challenges that make it hard to apply machine learning more broadly in cybersecurity. This project will develop new methods tailored to the cybersecurity domain for addressing concept drift, and it will advance the state of knowledge on robustness against concept drift in cybersecurity. The project has the potential to improve cybersecurity protections for everyday people, including improving antivirus software, phishing detectors, fraud/scam detection, and more, thereby making the Internet safer for everyone.
The team's approach is based on an understanding of the fundamental drivers of concept drift, including both gradual drift and emergence of entirely new types of threats. Threats can often be categorized into multiple categories. For instance, malware falls into many different 'malware families'. Each category may experience concept drift at a different rate. This provides an opportunity for new methods that take advantage of such differences across categories. To address the problem of categories that are experiencing rapid concept drift, the team plans to develop techniques to detect which categories are suffering from concept drift to the greatest degree and then select samples from those categories for human analysts to evaluate. For new types of threats, the team plans to develop techniques to identify samples from new categories so they can be submitted for human analysis. For categories that are experiencing gradual but sustained concept drift, the team plans to explore use of semi-supervised learning and pseudo labels to help the machine learning algorithm adapt to these changes in the data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Status | Finished |
---|---|
Effective start/end date | 10/1/22 → 9/30/24 |
Funding
- National Science Foundation: US$150,000.00
ASJC Scopus Subject Areas
- Artificial Intelligence
- Computer Networks and Communications