input/output

by Jean Thilmany, Associate Editor information watchdogs

If you're worried that computers are unassailable, fear not: It turns out that computers need watchdogs, too. But instead of an angry, barking dog, they depend on programs keyed to troll through important databases to look for suspicious signs of entry.

Researchers at Pennsylvania State University have tested and ranked three nonstatistical, data mining methods that classify and detect telltale patterns of entry and misuse left by typical computer network intruders. The best method the researchers found for foiling computer hackers is called rough sets, although they say that means of protecting networks is largely overlooked by the network-security industry.

The researchers say that computer security breaches have risen significantly in the last three years.

The U.S. General Accounting Office says the number of computer attacks in the United States is doubling every year. Fewer than 4 percent of those attacks will be detected, and just 1 percent will be reported. About 250,000 attempts were made in a one-year period to break into the federal computer system and 64 percent of those attempts were successful, according to the GAO.

With those statistics in mind, Chao-Hsien Chu, an associate professor of information sciences and technology at Penn State, undertook the research project aimed at determining the best way to monitor databases and uncover attacks. He began the study while on the faculty at Iowa State University in Ames.

"No network security system or firewall can ever be completely foolproof," Chu said. "So there's always a need for a watchdog to patrol the network and signal when an intrusion occurs. Commercially available watchdog systems depend on traditional statistical techniques. However, the newer smart methods promise to have a significant impact on accuracy."

Even the cleverest intruder leaves electronic footprints upon breaking and entering a secure computer data network, such as a bank, medical, or credit-records database, Chu said. But when used as a means of detecting intruders, the new watchdog methods collect information from a variety of sources within the network, learn the patterns typical of a perpetrator, and make a reasoned judgment about whether the pattern represents intrusion or not.

Chu and his team focused on three smart approaches, which are known as data-mining techniques: neural nets, inductive learning, and rough sets. All three data mining techniques can collect and learn information, and make reasoned predictions.

Typically, a neural network is initially trained or fed large amounts of data and rules about data relationships.

Systems that use inductive learning methods are programmed to infer general laws from particular examples. Knowledge is compiled by generalizing patterns into factual experience. For example, if a system is fed a number of descriptions of insolvent customers, it might learn in the future to recognize a potentially insolvent customer by relying on past descriptions.

Systems that use rough set theory take advantage of imprecision, vagueness, and uncertainty in data analysis. They focus on discovering patterns, rules, and knowledge in large pools of data.

Those who have used neural nets and inductive learning for intrusion detection and for research found them to be successful and effective, Chu said. But rough sets, a relatively new approach, hasn't been applied to intrusion detection. He says his team's study is the first to evaluate and compare multiple data mining methods, including rough sets, in intrusion detection.

The researchers say the rough sets detection method doesn't require any preliminary or additional information about the data, and can work with missing values and less expensive sets of measurements than the other methods.

Rough sets use imprecise values, where a pair of lower and upper approximations replaces imprecise or uncertain data. It's also able to discover important facts that are hidden in the data and express them in the language of decision rules, Chu said. He called rough sets a powerful method for characterizing complex, multidimensional patterns.

In their study, the team used data from the program Sendmail, which is used in nearly every Unix-based system that has e-mail. The Sendmail program used in the study included what Chu called normal and abnormal traces. Normal traces means the program is operating as usual. Abnormal traces means the program includes intrusions that exploit well-known problems in the Unix system. The researchers rank how often the three methods of detection ferreted out the abnormal traces.

The average accuracy ratings for the three programs were: rough sets, 76 percent accurate; neural networks, 70 percent; and inductive learning, 51 percent.

"The tremendous growth in the Internet and in electronic commerce has created serious challenges to network security," Chu said. "Advances in data mining and knowledge discovery provide new approaches to network intrusion detection."



home | features | weekly news | marketplace | departments | about ME | back issues | ASME | site search

© 2003 by The American Society of Mechanical Engineers