Monday, 7 July 2008

Statistical Methods to Determine the Authenticity of Data

I am presenting at CACS (Sydney 2008). This is the ISACA conference. My Presentation (which I am completing to submit tonight) is for session AUD 122.
It is on "Statistical Methods to Determine the Authenticity of Data".

The presentation will address statistical methods including:

  • PCA (Principle component analysis) to RF (Random Forests)
  • Classification And Regression Trees (CART) and Decision Trees in forensics,
  • Multivariate adaptive regression splines (MARS) in quantitative structure-retention as applied to email header information,
  • Classification and regression tree analysis for email header descriptor selection,
  • The evaluation of Two-Step Multivariate Adaptive Regression Splines for email analysis.
Next the presentation will address text mining techniques that may be used to determine the correlation between events from an anthology of prior events to determine authenticity.
This increases the ability to detect events of interest and limits the error rate.

The development of quantitative methods of analysis to detect tampering with logs offer great promise to the future of security and digital forensics. New methods of quantifying the statistical correlation between events and a log anthology from the subject using PCA (Principle component analysis), CART Decision Trees, and MARS predictive modelling to assign the probabilistic likelihood of associating a log event is expanding the forensic arsenal.

MARS and Regression Tree Analysis may be used together to achieve the best prediction success. The CART model can be difficult to use for cartographic purposes due to the high model complexity but also adds to the predictive capability in cases where a large test set (or email anthology) is available.

This creates a methodology that increases accuracy and makes fraud detection easier.

All this and more.

No comments: