discussion.tex

\chapter{DISCUSSION}
\section{Lessons Learned}
Throughout the course of this thesis, an abundance of lessons were learned with respect to designing a user interface tailored to the user’s experience and a user study able to capture the efficacy of the DDG. The lessons outlined in this section will serve as the motivation for potential future work.  

The results of the survey reveal that a sizable portion of the study-base found the user interface confusing or lacking in desired functionality. Largely, the suggested functional additions and improvements promote better interaction between the user and the generated data dependency graph. Asking the user for a trace file and displaying the entire data dependency graph is not in line with the user’s expectations or interests. Rather than displaying the entire graph at once, with all the nodes in the provided trace file’s history and expecting the user to utilize the search functionality to navigate, it would be better to allow the user to filter down the graph prior to displaying it. The analysis’ backend currently supports more fine-grained approaches to DDG generation but are not reflected by endpoints in the user interface. Possible improvements will be described in the future work section.

As for the design of the user study, the use of randomization had its drawbacks. While instrumental in eliminating the effects of fatigue and progressive learning, the randomization of challenge order without respect to challenge difficulty proved problematic. While the effect of progressive learning was eliminated, this proved counter-intuitive when the user was randomly assigned one of the more difficult challenges first. In their feedback, some participants noted that they would have appreciated starting with one of the later challenges first to better learn data dependency view prior to applying it to a more difficult challenge. A possible solution that would still eliminate the effect of progressive learning but also resolve the issue of users facing more difficult challenges first would be to have each challenge scale its difficulty to correlate to its randomized order in order to maintain an easy $\xrightarrow[]{}$ medium $\xrightarrow[]{}$ hard progression.

In addition to the effects inherent in randomization, whichever challenge the user was initially assigned would have a completion time swayed by factors excluding difficulty. As many participants did not have prior experience with \code{angr management}, asking them to learn the basics of interacting with the framework, the novel view, and solving a challenge all at once had a definite sway on the time it takes to complete the first task. 

After analyzing the results, the importance of wording in questions and objectivity were understood to be paramount. Questions concerning the user's self-perceived skill level was not statistically tied to results, and thus those types of questions proved useless. Furthermore, the wording of the median questions proved confusing to some participants. This required more lenient grading that took into account the user's free-form feedback as opposed to grading their multiple choice answers exclusively. 

As evidenced by feedback asking for functionality that was already explained on the introductory page, the medium by which the introductory lesson was delivered was not effective. Research has shown that product manuals are not an effective means of conveying information to users. Most people do not read instruction manuals and will utilize a fraction of a product's features and functionality \citep{blackler2016life}. Rather than presenting the introduction as a static webpage, a video would have been more engaging and provided users with a clearer demonstration of the features at their disposal. The decision to convey feature implementation had a serious negative effect on user perception of the data dependency view, with users requesting zoom functionality, subgraph functionality, search functionality, and node highlighting. All of these features were explained in the introduction. 

The largest lesson learned was the downsides of deploying this experiment on a cloud-based framework. While the use of a cloud-based solution allowed for participants from the wider cybersecurity community as well as more flexibility with the participant’s schedules, many users reported latency as a discomfort during the challenge-solving phase. While already being a network-intensive protocol, RDP becomes far more unusable with distance. Many interested participants from Europe and Asia were unable to complete the experiment due to the unbearable latency. The latency inherent to RDP even had a negative impact on user feedback in regards to the views comprehensibility. As evident from the free-form feedback, two participants found the view confusing due to the lag they encountered while using data dependency view. This will have had an effect on the data, despite being an independent variable. Future studies would have to weigh this consideration into account when considering experiment deployment methods.
In addition to relying on a server to conduct this experiment, participation was very much at the mercy of the server’s uptime and performance. Early in the user study, the server suffered from thrashing and would shut down under high load. Prior to this error being resolved, various participants would either have to have their results thrown out or were discouraged from continuing. Thus, if a cloud-based solution were to be utilized in a future user study, more servers would need to be deployed globally, clustered by participant locality. Having multiple servers would also serve as fault-tolerance, with one malfunctioning server offloading its work onto its peers to prevent interruption of participation.

\section{Future Work}
As evidenced by the feedback received and discussed in the lessons learned, work must be done to improve the user interface. Possible improvements and reworks, as aggregated from participant feedback, will now be described in greater detail. 

About 1/5 of users reported a desire for a more cohesive experience when using data dependency graph. Rather than loading a trace file for the current binary, these users stated they would prefer the option to right-click the constants or instructions in the disassembly view that they would like to generate a data dependency graph of. The current implementation could easily be pivoted to accommodate this modality of interaction. A symbolic execution state could be initialized at the earliest instruction in the user-specified selection and allowed to run, with a find target of the end instruction. Should there be a path that exists, the state at the end of symbolic execution could be passed to the data dependency analysis.

While the aforementioned solution will work, symbolic execution is an expensive operation. If the user were to only use data dependency on one or two instruction subsets during their analysis, then no further optimizations are necessary. Although, should the user wish to generate many DDGs, the lag between the start of the task and the generation of the view would be an annoyance. A possible solution would be to move the data dependency generation to a background thread that, upon completion of the underlying analysis, emits a Qt signal to inform the main thread that it can now generate the view. While the same amount of time would be required to generate the view, the UI would not be locked until completion and the user would be free to continue their analysis elsewhere.  

Another possible optimization could see the generation of a DDG being queued upon the user’s first request to see one. Regardless of the instruction subset requested, an initial request could see the data dependency analysis being dispatched in a background thread for the entire program-space. While the first data dependency graph requested would be slower, all subsequent calls could just utilize the subgraph functionality and be near instantaneous. Although, this solution has many trade-offs: increased computation that may go unutilized, more utilization of RAM to store the larger Digraph, and some binaries being far too large to warrant a DDG being generated for the entire program-space.

User feedback suggested that the current user interface provides a jarring experience when attempting to manually follow a dependency in the graph. For one, the scrolling speed resulted in users losing their place in the graph. Additionally, users reported a desire for color-coding certain paths branching out of a node to make manual tracing more feasible. This problem is compounded by the clutter produced by the arrows in the graph which should also be addressed.

The last category of improvements suggested by the users was improvements to searching. The search panel should allow users to specify if they would like to search within the current sub-graph or within the entire data dependency graph, as a few users voiced their desire for this functionality. Users also requested the ability to search for instructions using regex. 

Future work on data dependency graph should be focused on bridging the divide between data dependency graph and disassembly view / decompilation view. The ability to more quickly switch between the control flow graph and the data dependency would address the majority of feature requests from the participant pool.

Another promising vein of research for inclusion in data dependency graph is the application of taint analysis in deciding how graphs should be generated \citep{clause2007dytan, newsome2005dynamic, zhu2015dytaint}. The application of taint analysis could make data dependency graph more relevant in the realm of vulnerability analysis, as a correlation could not be established for the view proposed in this experiment. With the ability to specify sources (user-controllable origins of data) and sinks (potentially vulnerable instructions or functions) a data dependency graph could be generated that shows any existing connections between the source and the sinks. With the ability to trace forwards from the source to a sink or backwards from a sink to the source, the user would be presented with the series of checks and input restrictions required to exploit the vulnerable code in question. Furthermore, breadth-first search could be employed to find the shortest path from a given source to a given sink if multiple dependency paths exist (to make exploitation easier). Presenting the shortest path as opposed to the entire data dependency graph would also help prevent overloading the user with a massive dependency tree.