Skip Navigation
Text:
Increase font size
Decrease font size

    Density Estimation and Anomaly Detection in Large Social Networks

    Principal Investigator: Svetlana Lazebnik
    Funding Agency: Duke/U.S. Army Research Office
    Agency Number: 09-ARO-1103

    Abstract
    This proposal describes a three-year effort to develop scalable and robust statistical methods for analyzing interactions within a large social network using hypergraph data structures. In particular, when monitoring interactions within a social network, meetings or contacts between different members of the network are recorded; the proposed work addresses the problem of using the recorded meetings to determine (a) whether each meeting is anomalous, (b) the likelihood of a given meeting occurring, and (c) which other people are likely to interact with a given list of people. The focus in this case is on learning typical behavior patterns for members of the network, which are important predictors of future activities within the network. Such targeted querying capabilities should prove very useful in intelligence analysis and counterterrorism. Accurate assessment of a meeting’s anomalousness can help the Army allocate its limited resources more effectively.


    Several aspects of the problem make it particularly challenging. First, the number of observations may be very small relative to the size of the social network; this “data-starved” setting makes devising statistically robust and computationally efficient methods especially difficult. Second, in many real-world settings, we do not have access to all events at once, which precludes batch processing and necessitates online methods instead. Third, the mechanism underlying the observed events may be dynamically changing.


    The approach outlined in this proposal constitutes a significant departure from conventional graph-based approaches to these problems, and consists of examining novel, computationally tractable approaches to density estimation and anomaly detection in this high-dimensional setting by using the following key concepts:

    • Hypergraph theory. Hypergraphs are an important extension of graphs which allow edges to connect more than two vertices simultaneously, enabling an efficient, information-preserving representation of events.
    • Variational approximation. A variational approximation of multivariate distributions leads to extremely efficient methods for density estimation and Monte Carlo sampling.
    • Recursive orthogonal series estimation. Tractable orthogonal series estimators will find a subset of the coefficients so that, with high probability, “significant” coefficients are retained and “insignificant” coefficients are not.
    • Prediction with expert advice. The framework of prediction of individual sequences with expert advice facilitates fitting a time-varying mixture model to the observed sequence and derivation of theoretical performance bounds.
    • Particle filtering. Particle filtering methods lead to principled parameter updating methods for base experts.
    Document Actions