DAFuzz: data-aware fuzzing of in-memory data stores

PeerJ Comput Sci. 2023 Sep 19:9:e1592. doi: 10.7717/peerj-cs.1592. eCollection 2023.

Abstract

Fuzzing has become an important method for finding vulnerabilities in software. For fuzzing programs expecting structural inputs, syntactic- and semantic-aware fuzzing approaches have been particularly proposed. However, they still cannot fuzz in-memory data stores sufficiently, since some code paths are only executed when the required data are available. In this article, we propose a data-aware fuzzing method, DAFuzz, which is designed by considering the data used during fuzzing. Specifically, to ensure different data-sensitive code paths are exercised, DAFuzz first loads different kinds of data into the stores before feeding fuzzing inputs. Then, when generating inputs, DAFuzz ensures the generated inputs are not only syntactically and semantically valid but also use the data correctly. We implement a prototype of DAFuzz based on Superion and use it to fuzz Redis and Memcached. Experiments show that DAFuzz covers 13~95% more edges than AFL, Superion, AFL++, and AFLNet, and discovers vulnerabilities over 2.7× faster. In total, we discovered four new vulnerabilities in Redis and Memcached. All the vulnerabilities were reported to developers and have been acknowledged and fixed.

Keywords: Coverage-base fuzzing; Coverage-guided fuzzing; Data-aware; In-memory data store; Input generation; Semantic-aware.

Grants and funding

This work was supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LY22F020022, the National Natural Science Foundation of China under Grant No. 61902098, the Key Research Project of Zhejiang Province, China under Grant No. 2023C01025, and the “Pioneer” and “Leading Goose” R&D Program of Zhejiang under Grant No. 2023C03203. There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.