Saturday, 1 December 2007

Cookie Primer.

In contradiction to the claim that no information is sent from your computer to anybody outside your system, the majority of cookies are interactive (that is, information is not only written to them but also read from them by web servers you connect to).

Cookies are an HTTP mechanism that is widely used by Web servers to store information on a Web client. This information is in the form of a small amount of text. This text is transmitted in special HTTP headers.

Tools such as WebScarab can be used to view cookies inbound over the wire. This is more effective than browser controls as:

  • JavaScript embedded on the webpage may be used to create cookies where IE and other browsers will not prompt the user
  • The cookie could have been stored on hard drives from a previous session. In this case is not a new inbound cookie and the browser will not prompt the use

Persistent Cookie (File based and stored on Hard Drive)
System cookies are stored hard drive and expire at a future date. They are only deleted by the system after the Expires date passes. The deletion of these cookies assumes that the browser has been opened after this event.

Session Cookie (Memory Based)
Session cookies will expire either by date or when the browser has been closed. This is because they are only stored in memory and are not written to disk.

Cookie Flow


Cookie Headers
HTTP 1.0

  • Set-Cookie from server to browser
  • Cookie from browser to server

HTTP 1.1

  • Set-Cookie2 from server to browser
  • Cookie2 from browser to server

The standard is that the Web browser should allow at least 4096 bytes per cookie.

The 6 Parts of a Cookie
(Netscape specifications – all similar, but there are some differences in terminology)
Name
This is an arbitrary string used to identify the cookie so that a Web server can send more than one cookie to a user.
Domain
The domain specification contains the range of hosts that the browser is permitted to send a cookie to. This is generally a DNS specification and can be spoofed.
Path
This is the range of URL’s where the browser is permitted to transmit the cookie.
Expires
The time on the host system when the browser must expire or delete cookie.
Secure
This flag signifies that the cookie will only be sent with SSL enabled.
Data
The data section is the arbitrary strings of text contained within the cookie.
Part 7 of 6– P3P Field
Not technically part of the cookie, the “platform for privacy preferences” (P3P) field is a compact policy sent by the Web server using a HTTP header. IE v6 Web browsers will enable users to automatically accept cookies from sites with certain privacy policies.

The P3P specification may be found at: Http://www.w3.org/TR/P3P/

Some links on Cookies:
http://dtp-aus.com/cookies.htm
http://www.d-j-whiley.freeserve.co.uk/cookie.html
http://www.cookiecentral.com/

Cookies and the Law
Companies use cookies as a means of accumulating information about web surfers without having to ask for it. Cookies attempt to keep track of visitors to a Web site and to track state (as HTTP is session-less and stateless). The information that cookies collect from users may be profitable both in the aggregate and by the individual. Whether the convenience that cookies provide outweighs the loss of privacy is a question each Internet user must decide for him or herself. [3]

America OnLine has been accused of selling data based information about users. [4] This has led to an effort on the part of cookie proponents to control the amount of information that cookies collect. [5]. The Federal Trade Commission determined that Geocities, a popular web site where users input personal information, was selling information in apparent violation of its own privacy policy. [5]

Criticism of cookies has included fear of the loss of privacy. This is an issue primarily due to tracking cookies.

Tracking Cookies
HTTP is made to be “sessionfull ” by using either:

  • URL re-writing
  • Cookies

In the domain and path field of the cookie, a vague domain entry will allow the user’s browser to transmit the cookie to any machine in the domain listed. A cookie with “.com” for instance in the domain field and a path of “ / ” will allow any host in the “.com” domain to receive the cookie.

This is of course a privacy concern.

Tracking cookies are often used by advertising firms. They have their clients create a cookie that may be collected by any domain. In this way they can collect information they can be stored in databases for later correlation.

Cookies are generally legal. The issue comes when poorly configured cookies are utilised, on this account case law is sketchy at best. In Europe under the privacy provisions of the EC, it could be argued that accessing a tracking cookie that you did not create specifically (as is done by the advertising companies) is technically illegal.

In a similar fashion, it could be argued that access by third parties to cookies is an unauthorised access to data under US federal law.

The problem of both of these examples is that the law is untested. The easiest path is generally to seek a breach of contract (a privacy contract as sent within a cookie is a legally enforceable contract). In Europe breach of this contract could be a criminal offence.

An issue is that DNS spoofing is easy (and cookies rely on DNS).

References

  1. Peter Krakaur, Web Cookies, Fortune Cookies, and Chocolate Chip Cookies.
  2. Stephen T. Maher, Understanding Cookies: A Cookies Monster?
  3. Jonathan Rosenoer, Cyberlex, July 1997.
  4. See Webcompanies Announce Privacy Standards (announcing partnership of Netscape, VeriSign and Firefly).
  5. Internet Site Agrees to Settle FTC Charges of Deceptively Collecting Personal Information Agency's First Internet Privacy Case, FTC News Release, August 13, 1998.
  6. See, e.g., Deja News.

Friday, 30 November 2007

Emergent Human Psychology

There exist a number of different schools of thought, or perspectives, that have traditionally been used to understand human behaviour, psychodynamic, behaviourist, humanistic, cognitive, and evolutionary perspectives. It first must be noted that each of these approaches is a model. As a consequence, each particular school thought may be represented as correlating or diverting from individual incidents or group of results based on its particular assumptions.

Into this mix of models or schools of thought should be added emergence and emergent behaviour and consequently emergent psychology. In this we would have to address socialisation across groups of autonomous systems with the individual treated both as an isolated point function in the social group and also as a complex interaction of dynamic forces. Emergent psychology may then be represented through neurobiological interactions at the individual level through to divergent patterns of group behaviour as expressed within the bounds or complexity theory.

In human psychology, the individual in this school of thought derives from a self organised criticality when we exceed a level of physical and biological system complexity. Rather than isolating humanity as a creationistic anomaly or outlier, this school of thought would look at humanity through its progressive trend of increasing complexity. In this, complexity has increased in both a biological and sociological context.

Biologically humanity has increased in an anthropological progression over millions of years through a progress of evolution and genetic change. Sociologically humanity has progressed through a series of un-contained experiments. In this, a gradual increase in complexity has derived from social evolution which in a sense could be stated to match or coincide with a notion of evolutionary theory in that social groups which are “fitter” for their surrounding conditions (including the interaction with other groups) survive and out compete their less successful counterparts.

The psychology of mind in the theory of emergent psychology starts to emerge from the composition of the brain as a move from the neuron level to the ganglion level is conducted. Similarly social behaviours can be seen to emerge in group interactions as individuals are added.

Emergence may be seen as the end result of a complex interaction of evolutionary behaviours, social dynamics and logical persuasion.

Thursday, 29 November 2007

Property and Possession

Here are a few details of where Adverse Possession can be helpful. We have a nature strip at the back of our home.

Our neighbors generally waste the areas behind their property.
You can see that their area is full of weeds and noxious pest flora. Here we see around 8 meters deep of lantana and privet.
It makes walking behind the area impossible - or at least a jungle trip. For a few years it has failed as an access strip.

A better use is made through working and building up the land. On the strip behind our property, we have an area that varies from 8 to 10 meters in width before it hits the area for the stream which we leave alone. The strip runs the width of the yard which is a little over 20 meters. In the 200 square meters or so, we have planted numerous types of citris (Lemon, Lime, Orange, Tangerines etc).
In addition, we have banana palms. We started clearing and using this area the week we moved into the house in 2004. Since then we have made the land more productive each year. The crops are small at present, but they are growing each year.

I do admit, there is a greater effort involved. It is worth it and beats leaving the land to waste.

Digital Forensics in the Age of Virtualisation.

The following are a few points on the effects of virtualisation on digital forensics. This is a begioning of a paper and a little bit of rambling as I am researching these issues.

  • Memory state is retained by the virtualised system

  • Memory forensics is currently a technically difficult field with few qualified people

  • VM’s make capture simple – both of disk and memory

  • VM’s have a snapshot capability, this is handy for incident response and forensic capture

The reasons for these points come from the fact that memory (esp. on Microsoft) may contain details of deleted files and transactions for a long time (an example is email deleted on a server may be retained in the memory stack for weeks though the sender believes that it was deleted and wiped).

The capture capability of the snapshot functions on VM’s means that a single file can be captured with all memory and state information.

Personally I use an open source tool called “Liveview” to view captured images. A simple “dd” bit image can be loaded into Liveview to replay the image as if I was on the host. Liveview links to VMWare to play the captured images.

With the snapshot and replay functions of VMWare coupled with Liveview, I can load a copy of a forensically captured image and test it “offline”. This allows me to use tools that may alter the image without fear of contaminating the evidencal value of the image – as I am only using a copy.

Liveview allows the configuration of the system time to start the image and I can thus experiment without corrupting evidence.

When I have found the evidence of what has occurred, I can replay the actions that I have taken using VMWare’s replay function. This allows for the presentation of the evidence in a non-technical manner that the jurists may comprehend.

In the case of organisations that are already using VM’s, this process is simplified. The vast majority of the capture process is effectively done for me. The issue is that the host may also have much more data then the company running the VM wanted to retain.

eDiscovery and Document retention come into the discussion at this point. There are requirements to hold documents when a case has started or if one is likely. As memory and state hold information, and coupled with some of the decisions in the US that may be influential here in Australia (though not authoritive), it is likely that they could be called under subpoena or captured in an Anton Pillar (civil search) order.

In this, files that a company had believed destroyed could actually be recovered.

Worse, documents outside of the request listed inn the order could be inadvertently provided given the difficulties of separating material held in state data.

Wednesday, 28 November 2007

Free Breakfast Session in Sydney

E-Discovery, Digital Forensics and how it is changing Litigation in the OnLine world

Watch this space for further details. I am offering a free breakfast session (including breakfast and networking) in Feb 2008 to launch the first of many monthly information and update sessions. Dates and booking instructions will follow soon.

Other planned sessions include:

  • IT Governance, What you need to know but your IT department is too afraid to tell you (March)
  • Document handling – from law to practice in IT. What you need to know (April)
  • PCI-DSS and it’s impact on IT (May)
  • Digital Forensics and Incident handling, what you need to know (June).
  • The Latest developments in Attack Technologies (What the Hackers don’t want you to know)

Experimental Design and Clustering

An issue with experimental design derives from the economic constraints placed upon it. For instance, rather than implementing truly random samples, cost constraints force researchers to use other techniques. In multi-site studies, cluster randomised designs beside entire sites into the same treatment groups with different sites assigned to different treatments. This creates statistical clusters.

Although sophisticated methods such as hierarchical linear modelling schemes (e.g., Raudenbush and Bryk, 2002) have been formulated, a variety of problems exist with the analysis of cluster randomised trials (e.g., Raudenbush and Bryk, 2002; Donner and Klar, 2000; Klar and Donner, 2002; Murray, Varnell, and Blitstein, 2004). It needs also to be noted that many researchers failed to implement more advanced statistical clustering techniques in the design of their experiments.

Each cluster may be said to correspond as a quasi-experiment in itself to the results of the cluster randomised trial. Rooney and Murray (1996) pointed out the problems of meta-analysis in cluster randomised trials due to the effects size estimation problems where conventional estimates were not appropriate and standard error could, be incorrect.

In assessing the validity of descriptive and experimental research techniques, it is necessary to differentiate between patterns and models. When seeking patterns we are an effect representing local properties of the data. On the other hand, a model aims to fully describe the data. A commonly used example pattern would be Association rules. An association rule notes the rate at which two variables may occur together. In psychology this could be used to represent occurrences that appear together more often than would be expected if they were statistically independent.

Classification differs from clustering in that it is predictive of rather than descriptive. With clustering, there is no correct answer for the allocation of observations to groups. With classification, the current other set will include group labels and the psychologist will seek to derive a method to obtain future data without these labels.

  • Donner, A. & Klar, N. (2000). Design and analysis of cluster randomization trials in health research. London: Arnold.

  • Donner, A. & Klar, N. (2002). Issues in the meta-analysis of meta-analysis of cluster randomized trials. Statistics in Medicine, 21, 1971-2980.

  • Klar, N. & Donner, A. (2001). Current and future challenges in the design and analysis of cluster randomization trials. Statistics in Medicine, 20, 3729-3740.

  • Murray, D. M., Varnell, S. P., & Blitstein, J. L. (2004). Design and analysis of group and randomized trials: A review of recent methodological developments. American Journal of Public Health, 94, 423-432.

  • Raudenbush, S. W. & Bryk, A. S. (2002). Hierarchical linear models. Newbury Park, CA: Sage Publications.

  • Rooney, B. L. & Murray, D. M. (1996). A meta-analysis of smoking prevention programs after adjustment for errors in the unit of analysis. Health Education Quarterly, 23, 48-64.

Tuesday, 27 November 2007

Why Cache Poisoning?

The simplest variety of cache poisoning involves transmitting counterfeit replies to the victim’s DNS server. The technique is an artifice that results in a Domain Name Server (DNS server) caching bogus entries as if they are authentic. A poisoned DNS server will generally cache this information for a time, impacting many of the server’s clients.

A cache poisoning attack is generally performed when an attacker exploits a vulnerability of the DNS system that allows it to receive false information. When a DNS server incorrectly authorizes DNS responses that it has not confirmed from an authoritative source, the server will consequently cache inaccurate entries may serve other clients with these erroneous records.
This technique is used to substitute the requested site address (or other information) with one of the attacker’s preference.

Organised crime is just one of the many varietals of attacker that are making use of this exploit. As an example, think what could occur if an attacker could alter the IP address of an online banking site. They could for instance redirect the site to a reverse proxy and “harvest” user name and password combinations before redirecting the unsuspecting victim to the real banking site.

The worst part, the little padlock contained in the browser window will lie and tell you that the site is secure.

The answer, well split brain DNS is a start. Thinking that the firewall is not the be all and end all of security goes further.

Monday, 26 November 2007

Implementing Fraud Detection using Bayesian methods in Data Sets with Benford's Law

The journal; “Communications in Statistics: Simulation and Computation” published by “Taylor & Francis” featured an article “Detecting Fraud in Data Sets Using Benford's Law” in Volume 33, Number 1 / 2004 (Pages: 229– 46). This article by Christina Lynn Geyer and Patricia Pepple Williamson looks to the use of Bayesian networks rather than the Distortion Factor (DF) Model which is generally used to detect fraud in financial data.

This paper derives from the use of a Bayesian alternative which the authors state “outperforms the DF model for any reasonable significance level. Similarly, the Bayesian approach proposed as an alternative to the classical chi-square goodness-of-fit test outperforms the chi-square test for reasonable significance levels”.[1]

The purpose of the original project that spawned this post was to write an R function that implements this approach to using Benford’s law. The input will be a set of financial data. The expected output will be a statistical likelihood of fraudulent transactions being present in the data.

The aim was to provide an alternative approach to analysing data than Distortion Factor (DF) model as was developed by Mark Nigrini and first appeared in Nigrini (1996). The DF model makes two assumptions, these are:

  • “That people do not manipulate data outside of the original magnitude in other words, a person is more likely to change a 10 to a 12 than change a 10 to a 100.”[2],
  • And that “percentage of manipulation is approximately equal across the magnitudes. This means that someone may change a 50 to a 55 or a 500 to a 550, but would probably not change a 500 to a 505.”[3]

Taylor & Francis (2004) propose a Bayesian approach as an alternative to the DF model first proposed by Nigrini. They demonstrate that this process is more efficient for any reasonable significance level. They further note that although there is little value in comparing hypothesises using different approaches that as the DF and Bayes methods of expressing the likelihood of finding fraudulent data are based on different calculations that they may be compared for validity. Their results and data conclude that the Bayesian model is a valid alternative to Nigrini’s DF model.

The process is of great interest to Tax accounting, financial audit and forensic data analysis. As data in a company’s financial reports should confirm to Benford’s law if truthfully reported, nonconformity will raise a level of distrust even if the data is valid.

The paper discusses the existing methods used to implement a data analysis using Benford’s law and compares these with two Bayesian alternatives including the one proposed by the authors and another by Ley (1996).

The authors have demonstrated (using a variety of data sets) that the Bayesian approach is valid and gives the same results as the DF method. They more importantly note that it is more efficient as well as being valid. This is of value in the accounting and audit sectors. The improved performance makes the possibility of ongoing data analysis likely. Increasing the chances to automate and review data to detect fraud on an ongoing basis makes this process highly valuable to business.

Algorithms
This algorithm details the method used in the calculation of the Bayesian number proposed by Geyer and Williamson (2004). The alternative (and the original) method called the Distortion Factor (DF) developed by Nigrini (2000) is included for comparison.

  • β0 is the Bayes Factor for the relative likelihood oh H0 to H1 provided solely by the data as defined by Geyer and Williamson (2004)
  • β1 is the alternate Bayes Factor as defined by Geyer and Williamson (2004) in section 4.1 of their paper.
  • θo is the mean of a Benford set scaled to the interval [10, 100]

  • AM is the Actual Mean
  • EM is the Expected Mean
  • DF is the Distortion Factor
  • The Distortion Factor is defined by Nigrini (2000, p61) as a method of testing conformity of data to Benford’s law.

Using the census dataset demonstrates a real world example of Benford’s Law. This dataset is a set of US Census data as used by Nigrini (2000) and distributed by him. The other datasets are ones which come with the R package.

We can see that the biggest issue with these packages is their size. It would be expected that increasing the size of these datasets would bring them more into line with the expectations of Benford’s law.

It must be further noted that the 2 digit tests require a far larger sample to confirm to the law.

A known bad dataset based on Airtravel is demonstrated below.


These techniques provide a good staring point for fraud analysis, which is where I currently use these (and why I developed these techniques).

Where I plan to take this is traffic and anomaly analysis. The methods match well with traffic paterns. For instance, Loki has shown itself to create paterns of traffic in ICMP that are easily detected using 2 factor Benford's analysis. Eventually I hope to add these techniques into common use for IT Security.

  1. Geyer & Williamson, 2004, P 245[1] Taylor & Francis 2004
  2. Taylor & Francis 2000
  3. ibid