Identifying forensically uninteresting files in a large corpus

N. C.  Rowe

doi:10.4108/eai.8-12-2016.151725

Identifying forensically uninteresting files in a large corpus

Authors

N. C. Rowe Naval Postgraduate School

DOI:

https://doi.org/10.4108/eai.8-12-2016.151725

Keywords:

digital forensics, metadata, files, corpus, data reduction, hashes, triage, whitelists, classification, malware, camouflage

Abstract

For digital forensics, eliminating the uninteresting is often more critical than finding the interesting. We discuss methods exploiting the metadata of a large corpus. Tests were done with an international corpus of 262.7 million files obtained from 4018 drives. For malware investigations, we show that using a Bayesian ranking formula on metadata can increase malware recall by 5.1 while increasing precision by 1.7 times over inspecting executables alone. For more general investigations, we show that requiring two of nine criteria for uninteresting files, with exceptions for some special interesting files, can exclude 77.4% of our corpus. For a test set that was manually inspected, interesting files identified as uninteresting were 0.18% and uninteresting files identified as interesting were 29.31%. The generality of the methods was confirmed by separately testing two halves of our corpus. This work provides both new uninteresting hash values and programs for finding more.

References

Downloads

Published

08-12-2016

Issue

Vol. 3 No. 7 (2016): EAI Endorsed Transactions on Security and Safety

Section

Research article

License

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 4.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.

How to Cite

Rowe NC. Identifying forensically uninteresting files in a large corpus. EAI Endorsed Trans Sec Saf [Internet]. 2016 Dec. 8 [cited 2026 Jul. 26];3(7):e2. Available from: https://publications.eai.eu/index.php/sesa/article/view/513

Download Citation

Identifying forensically uninteresting files in a large corpus

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Make a Submission