KaaShiv InfoTech, Number 1 Inplant Training Experts in Chennai.
The adoption of business intelligence technology in industries is growing rapidly. Business managers are not satisfied with ad hoc and static reports and they ask for more flexible and easy to use data analysis tools. Recently, application interfaces that expand the range of operations available to the user, hiding the underlying complexity, have been developed. The paper presents eLog, a business intelligence solution designed and developed in collaboration between the database group of the University of Modena and Reggio Emilia and eBilling, an Italian SME supplier of solutions for the design, production and automation of documentary processes for top Italian companies. eLog enables business managers to define OLAP reports by means of a web interface and to customize analysis indicators adopting a simple meta-language. The framework translates the user's reports into MDX queries and is able to automatically select the data cube suitable for each query. Over 140 medium and large companies have exploited the technological services of eBilling S.p.A. to manage their documents flows. In particular, eLog services have been used by the major media and telecommunications Italian companies and their foreign annex, such as Sky, Media set, H3G, Tim Brazil etc. The largest customer can provide up to 30 millions mail pieces within 6 months (about 200 GB of data in the relational DBMS). In a period of 18 months, eLog could reach 150 millions mail pieces (1 TB of data) to handle.
The core idea of this project is to create a multidimensional data with the obtained normalized database. After creating the database the data needs to be pointed towards various dimensionalities based on the business requirements. Once the traditional flow of multidimensional data creation, the data will be factorized to generate reports using multi dimensional queries.MDX is nothing but a specialized syntax for querying and manipulating the multidimensional data stored in OLAP cubes. It’s quite simple to translate some of these queries into traditional SQL, it would frequently requires the synthesis of grouped SQL expressions even for very simple MDX expressions. Our objective is to create a custom engine which can provide a very easy option of creating the MDX queries in the background and provide an apt data functionalities in the front end with user friendliness.
Usually, data mining is considered as the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. In our data-driven data mining model, knowledge is originally existed in data, but just not understandable for human. Data mining is taken as a process of transforming knowledge from data format into some other human understandable format like rule, formula, theorem, etc. In order to keep the knowledge unchanged in a data mining process, the knowledge properties should be kept unchanged during a knowledge transformation process. Many real world data mining tasks are highly constraint-based and domain-oriented. Thus, domain prior knowledge should also be a knowledge source for data mining. The control of a user to a data mining process could also be taken as a kind of dynamic input of the data mining process. Thus, a data mining process is not only mining knowledge from data, but also from human. This is the key idea of Domain- oriented Data-driven Data Mining (3DM). In the view of granular computing (GrC), a data mining process can be considered as the transformation of knowledge in different granularities. Original data is a representation of knowledge in the finest granularity. It is not understandable for human. However, human is sensitive to knowledge in coarser granularities. So, a data mining process could be considered to be a transformation of knowledge from a finer granularity space to a coarser granularity space. The understanding for data mining of3DM and GrC is consistent to each other. Rough set and fuzzy set are two important computing paradigms of GrC. They are both generalizations of classical set theory for modeling vagueness and uncertainty. Although both of them can be used to address vagueness, they are not rivals. In some real problems, they are even complementary to each other. In this plenary talk, the new understanding for data mining, domain-oriented data-driven data mining (3DM), will be introduced. The relationship of 3DM and GrC, and granular computing based data mining in the views of rough set and fuzzy set will be discussed.
The following topics are dealt with: data mining in Web 2.0 environment; knowledge-discovery which are from multimedia data and multimedia applications; mining and management of biological data; data mining in medicine; optimizationbased data mining techniques;high; data mining on; data streaming mining and management; spatial and spatio-temporal data mining.
In this paper, we propose four data mining models for the Internet of Things, which are multi-layer data mining model, distributed data mining model, Grid based data mining model and data mining model from multi-technology integration perspective. Among them, multi-layer model includes four layers: (1)data collection layer, (2) data management layer, (3) event processing layer, and (4) data mining service layer. Distributed data mining model can solve problems from depositing data at different sites. Grid based data mining model allows Grid framework to realize the functions of data mining. Data mining model from multi-technology integration perspective describes the corresponding framework for the future Internet. Several key issues in data mining of IoT are also discussed.
Many organizations often underutilize their existing data warehouses. In this paper, we are that suggest a way of acquiring more information from corporate data warehouses without the complications and drawbacks of deploying additional software systems. Association-rule mining, which captures co-occurrence patterns within data, has attracted considerable efforts from data warehousing researchers and practitioners alike. Unfortunately, most data mining tools are loosely coupled, at best, with the data ware house repository. Furthermore, these tools can often find association rules only within the main fact table of the data warehouse (thus ignoring the information-rich dimensions of the star schema) and are not easily applied on non-transaction level data often found in data warehouses. In this paper, we present a new data-mining framework that is tightly integrated with the data warehousing technology. Our framework has several advantages over the use of separate data mining tools. First, the data stays at the data warehouse, and thus the management of security and privacy issues is greatly reduced. Second, we utilize the query processing power of a data warehouse itself, without using a separate data-mining tool. In addition, this framework allows ad-hoc data mining queries over the whole data warehouse, not just over a transformed portion of the data that is required when a standard data-mining tool is used. Finally, this framework also expands the domain of association-rule mining from transaction-level data to aggregated data as well.
Data mining is the process of posing queries and extracting patterns, often previously unknown from large quantities of data using pattern matching or other reasoning techniques.Data mining has many applications in security including for national security as well as for cyber security. The threats to national security include attacking buildings, destroying critical infrastructures such as power grids and telecommunication systems. Data mining techniques are being investigated to find out who the suspicious people are and who is capable of carrying out terrorist activities. Cyber security is involved with protecting the computer and network systems against corruption due to Trojan horses, worms and viruses. Data mining is also being applied to provide solutions such as intrusion detection and auditing. The first part of the presentation will discuss my joint research with Prof. Latium Khan and our students at the University of Texas at Dallas on data mining for cyber security applications. For example, anomaly detection techniques could be used to detect unusual patterns and behaviors. Link analysis may be used to trace the viruses to the perpetrators. Classification may be used to group various cyber attacks and then use the profiles to detect an attack when it occurs. Prediction may be used to determine potential future attacks depending in a way on information learned about terrorists through email and phone conversations.
The well-known privacy-preserveddataminingmodifies existing data mining techniques to randomized data. In this paper, we investigate data mining as a technique for masking data, therefore, termed data mining based privacy protection. This approach incorporates partially the requirement of a targeted data mining task into the process of masking data so that essential structure is preserved in the masked Data. The idea is simple but novel: we explore the data generalization concept from data mining as a way to hide detailed information, rather than discover trends and patterns. Once the data is masked, standard data mining techniques can be applied without modification. Our work demonstrated another positive use of data mining technology: not only can it discover useful patterns, but also mask private information. We consider the following privacy problem: a data holder wants to release a version of data for building classification models, but wants to protect against linking the released data to an external source for inferring sensitive information. We adapt an iterative bottom-up generalization from data mining to generalize the data. The generalized data remains useful to classification but becomes difficult to link to other sources. The generalization space is specified by a hierarchical structure of generalizations. A key is identifying the best generalization to climb up the hierarchy at each iteration. Enumerating all candidate generalizations is impractical. We present a scalable solution that examines at most one generalization in each iteration for each attribute involved in the linking.
This paper accentuate an approach of implementing distributed data mining (DDM) using multi-agent system (MAS) technology, and proposes a data mining technique of ldquoCAKErdquo (classifying, associating &knowledge discovery). The architecture is based on centralized parallel data miningagents (PADMAs). Data mining is part of a word, which has been recently introduced known as BI or business intelligence. The need is to derive knowledge out of the abstract data. The process is difficult, complex, time consuming and resource starving. These highlighted problems addressed in the proposed model. The model architecture is distributed, uses knowledge-driven mining technique and flexible enough to work on any data warehouse, which will help to overcome these problems. Good knowledge of data, meta-data and business domain is required for defining rules for data mining. Taking into consideration that the data and data warehouse has already gone through the necessary processes and ready for data mining.
With the increasing use of database applications, mining interesting information from huge databases becomes of great concern and a variety of mining algorithms have been proposed in recent years. As we know, the data processed in data mining may be obtained from many sources in which differentdata types may be used. However, no algorithm can be applied to all applications due to the difficulty of fitting data types to the algorithm. The selection of an appropriate data mining algorithm is based not only on the goal of the application, but also the data fit ability. Therefore, to transform the non-fitting datatype into a target one is also important in data mining, but the work is often tedious or complex since a lot of data types exist in the real world. Merging the similar data types of a given selected mining algorithm into a generalized data type seems to be a good approach to reduce the transformation complexity. In this work, the data type fit ability problem for six kinds of widely used data mining techniques is discussed and a data type generalization process, including merging and transforming phasesis proposed. In the merging phase, the original data types of the data sources to be mined are first merged into the generalized ones. The transforming phase is then used to convert the generalizeddata types into the target ones for the selected mining algorithm. Using the data type generalization process, the user can select an appropriate mining algorithm just for the goal of the application without considering the data types
This paper proposes a framework for cost-sensitive classification under a generalized cost function. By combining decision trees with sequential binary programming, we can handle unequal misclassification costs, constrained classification, and complex objective functions that other methods cannot. Our approach has two main contributions. First, it provides a new method for cost-sensitive classification that outperforms a traditional, accuracy-based method and some current cost-sensitive approaches. Second, and more important, our approach can handle a generalized cost function, instead of the simpler misclassification cost matrix to which other approaches are limited.
Many important industrial applications rely on data mining methods to uncover patterns and trends in large data warehouse environments. Since a data warehouse is typically updated periodically in a batch mode, the mined patterns have to be updated as well. This requires not only accuracy from data mining methods but also fast availability of up-to-date knowledge, particularly in the presence of a heavy update load. To cope with this problem, we propose the use of online data mining algorithms which permanently store the discovered knowledge in suitable data structures and enable an efficient adaptation of these structures after insertions and deletions on the raw data. In this paper, we demonstrate how hierarchical clustering methods can be reformulated as online algorithms based on the hierarchical clustering method OPTICS, using a density estimator for data grouping. We also discuss how this algorithmic schema can be specialized for efficient online single-link clustering. A broad experimental evaluation demonstrates that the efficiency is superior with significant speed-up factors even for large bulk insertions and deletions.
In this paper we propose a novel approach that uses structure as well as the content of emails in a folder for email classification. Our approach is based on the premise that representative - common and recurring -structures/patterns can be extracted from a pre-classified email folder and the same can be used effectively for classifying incoming emails. A number of factors that influence representative structure extraction and the classification are analyzed conceptually and validated experimentally. In our approach, the notion of inexact graph match is leveraged for deriving structures that provide coverage for characterizing folder contents. Extensive experimentation validates the selection of parameters and the effectiveness of our approach for email classification.
We consider the problem of detecting anomalies in data that arise as multidimensional arrays with each dimension corresponding to the levels of a categorical variable. In typical data mining applications, the number of cells in such arrays is usually large. Our primary focus is detecting anomalies by comparing information at the current time to historical data. Naive approaches advocated in the process control literature do not work well in this scenario due to the multiple testing problems - performing multiple statistical tests on the same data produce excessive number of false positives. We use an empirical Bayes method which works by fitting a two component Gaussian mixture to deviations at current time. The approach is scalable to problems that involve monitoring massive number of cells and fast enough to be potentially useful in many streaming scenarios. We show the superiority of the method relative to a naive "per component error rate" procedure through simulation. A novel feature of our technique is the ability to suppress deviations that are merely the consequence of sharp changes in the marginal distributions. This research was motivated by the need to extract critical application information and business intelligence from the daily logs that accompany large-scale spoken dialog systems deployed by AT&T. We illustrate our method on one such system.
We present a new framework for classifier fusion that uses a shared sampling distribution for obtaining a weighted classifier ensemble. The weight update process is self regularizing as subsequent classifiers trained on the disjoint views rectify the bias introduced by any classifier in preceding iterations. We provide theoretical guarantees that our approach indeed provides results which are better than the case when boosting is performed separately on different views. The results are shown to outperform other classifier fusion strategies on a well known texture image database.
Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose Web search systems. Such classification becomes critical if the system is to return results not just from a general Web collection but from topic-specific back-end databases as well. Maintaining sufficient classification recall is very difficult as Web queries are typically short, yielding few features per query. This feature sparseness coupled with the high query volumes typical for a large-scale search service makes manual and supervised learning approaches alone insufficient. We use an application of computational linguistics to develop an approach for mining the vast amount of unlabeled data in Web query logs to improve automatic topical Web query classification. We show that our approach in combination with manual matching and supervised learning allows us to classify a substantially larger proportion of queries than any single technique. We examine the performance of each approach on a real Web query stream and show that our combined method accurately classifies 46% of queries, outperforming the recall of best single approach by nearly 20%, with a 7% improvement in overall effectiveness.
Given a large collection of medical images of several conditions and treatments, how can we succinctly describe the characteristics of each setting? For example, given a large collection of retinal images from several different experimental conditions (normal, detached, reattached, etc.), how can data mining help biologists focus on important regions in the images or on the differences between different experimental conditions? If the images were text documents, we could find the main terms and concepts for each condition by existing IR methods (e.g., ft./if and LSI). We propose something analogous, but for the much more challenging case of an image collection: We propose to automatically develop a visual vocabulary by breaking images into n × n tiles and deriving key tiles ("Vivo") for each image and condition. We experiment with numerous domain-independent ways of extracting features from tiles (color histograms, textures, etc.), and several ways of choosing characteristic tiles (PCA, ICA). We perform experiments on two disparate biomedical datasets. The quantitative measure of success is classification accuracy: Our "Vivo" achieve high classification accuracy (up to 83 %for a nine-class problem on feline retinal images). More importantly, qualitatively, our "Vivo" do an excellent job as "visual vocabulary terms": they have biological meaning, as corroborated by domain experts; they help spot characteristic regions of images, exactly like text vocabulary terms do for documents; and they highlight the differences between pairs of images.
The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data. The learned similarity function is subsequently used in clustering to determine which records are co-referent and should be linked. We present an online machine learning method for addressing this problem, where a composite similarity function based on a linear combination of basic functions is learned incrementally. We illustrate the efficacy of this approach on several real-world datasets from an Internet comparison shopping site, and show that our method is able to effectively learn various distance functions for product data with differing characteristics. We also provide experimental results that show the importance of considering multiple performance measures in record linkage evaluation.
Assessing rules with interestingness measures is the cornerstone of successful applications of association rule discovery. However, there exists no information-theoretic measure which is adapted to the semantics of association rules. In this article, we present the directed information ratio (DIE), a new rule interestingness measure which is based on information theory. DIR is specially designed for association rules, and in particular it differentiates two opposite rules a → b and a → b~. Moreover, to our knowledge, DIR is the only rule interestingness measure which rejects both independence and (what we call) equilibrium, i.e. it discards both the rules whose antecedent and consequent are negatively correlated, and the rules which have more counter-examples than examples. Experimental studies show that DIR is a very filtering measure, which is useful for association rule post-processing.
Data mining algorithms are facing the challenge to deal with an increasing number of complex objects. For graph data, a whole toolbox of data mining algorithms becomes available by defining a kernel function on instances of graphs. Graph kernels based on walks, sub trees and cycles in graphs have been proposed so far. As a general problem, these kernels are either computationally expensive or limited in their expressiveness. We try to overcome this problem by defining expressive graph kernels which are based on paths. As the computation of all paths and longest paths in a graph is NP-hard, we propose graph kernels based on shortest paths. These kernels are computable in polynomial time, retain expressivity and are still positive definite. In experiments on classification of graph models of proteins, our shortest-path kernels show significantly higher classification accuracy than walk-based kernels.
Many applications track the movement of mobile objects, which can be represented as sequences of timestamped locations. Given such a spatiotemporal series, we study the problem of discovering sequential patterns, which are routes frequently followed by the object. Sequential pattern mining algorithms for transaction data are not directly applicable for this setting. The challenges to address are: (i) the fuzziness of locations in patterns, and (ii) the identification of non-explicit pattern instances. In this paper, we define pattern elements as spatial regions around frequent line segments. Our method first transforms the original sequence into a list of sequence segments, and detects frequent regions in a heuristic way. Then, we propose algorithms to find patterns by employing a newly proposed substring tree structure and improving a priori technique. A performance evaluation demonstrates the effectiveness and efficiency of our approach.
We present SUDA2, a recursive algorithm for finding minimal sample unique (MSUs). SUDA2 uses a novel method for representing the search space for MSUs and new observations about the properties of MSUs to prune and traverse this space. Experimental comparisons with previous work demonstrate that SUDA2 is not only several orders of magnitude faster but is also capable of identifying the boundaries of the search space, enabling datasets of larger numbers of columns than before to be addressed.
Data mining aims at extraction of previously unidentified information from large databases. It can be viewed as an automated application of algorithms to discover hidden patterns and to extract knowledge from data. Online Analytical Processing (OLAP) systems, on the other hand, allow exploring and querying huge datasets in interactive way. These OLAP systems are the predominant front-end tools used in data warehousing environments and the OLAP system's market has developed rapidly during the last few years. Several works in the past emphasized the integration of OLAP and data mining. More recently, data mining techniques along with OLAP have been applied in decision support applications to analyze large data sets in an efficient manner. However, in order to integrate data mining results with OLAP the data has to be modeled in a particular type of OLAP schema. An OLAP schema is a collection of database objects, including tables, views, indexes and synonyms. Schema generation process was considered a manual task but in the recent years research communities reported their work in automatic schema generation. In this paper, we reviewed literature on the schema generation techniques and highlighted the limitations of the existing works. The review reveals that automatic schema generation has never been integrated with data mining. Hence, we propose a model for data mining and automatic schema generation of three types namely star, snowflake, and galaxy. Hierarchical clustering technique of data mining was used and schema from the clustered data was generated. We have also developed a prototype of the proposed model and validated it via experiments of real-life data set. The proposed model is significant as it supports both integration and automation process.
An effective analysis of clinical trials data involves analyzing different types of data such as heterogeneous and high dimensional time series data. The current time series analysis methods generally assume that the series at hand have sufficient length to apply statistical techniques to them. Other ideal case assumptions are that data are collected in equal length intervals, and while comparing time series, the lengths are usually expected to be equal to each other. However, these assumptions are not valid for many real data sets, especially for the clinical trials data sets. An addition, the data sources are different from each other, the data are heterogeneous, and the sensitivity of the experiments varies by the source. Approaches for mining time series data need to be revisited, keeping the wide range of requirements in mind. In this paper, we propose a novel approach for information mining that involves two major steps: applying a data mining algorithm over homogeneous subsets of data, and identifying common or distinct patterns over the information gathered in the first step. Our approach is implemented specifically for heterogeneous and high dimensional time series clinical trials data. Using this framework, we propose a new way of utilizing frequent item set mining, as well as clustering and clustering techniques with novel distance metrics for measuring similarity between time series data. By clustering the data, we find groups of analyses (substances in blood) that are most strongly correlated. Most of these relationships already known are verified by the clinical panels, and, in addition, we identify novel groups that need further biomedical analysis. A slight modification to our algorithm results an effective declustering of high dimensional time series data, which is then used for "feature selection." Using industry-sponsored clinical trials datasets, we are able to identify a small set of analyses that effectively models the state of normal health
Frequent episode mining has been proposed as a data mining task with the goal of recovering sequential patterns from temporal data sequences. While several episode mining approaches have been proposed in the last fifteen years, most of the developed techniques have not been evaluated on a common benchmark data set, limiting the insights gained from experimental evaluations. In particular, it is unclear how well episodes are actually being recovered, leaving an episode mining user without guidelines in the knowledge discovery process. One reason for this can be found in non-disclosure agreements that prevent real life data sets on which approaches have been evaluated from entering the public domain. But even easily accessible real life data sets would not allow to ascertain miners' abilities to identify underlying patterns. A solution to this problem can be seen in generating artificial data, which has the added advantage that patterns can be known, allowing to evaluate the accuracy of mined patterns. Based on insights and experiences stemming from consultations with industrial partners and work with real life data, we propose a data generator for the generation of diverse data sets that reflect realistic data characteristics. We discuss in detail which characteristics real life data can be expected to have and how our generator models them. Finally, we show that we can recreate artificial data that has been used in the literature, contrast it with real life data showing very different characteristics, and show how our generator can be used to create data with realistic characteristics.
With the development of internet and storage technology, we have got a lot of data. In order to find the information from these data, Data Mining has become an increasingly important topic in research as well as in industrial application. Up to now, there are a lot of Data Mining methods and specific tools. This article mainly talks about a new Data Mining Method called Data Mining based on Lattice. It has been applied in many research areas, such as: Data Bases, DataAnalysis and Machine Learning Technology. The experiments finished by foreign researcher showed it may be a useful method for information retrieval and machine learning problem domains. Data Mining based on Lattice is indeed a better method of organization, which is useful for each domain. The use and application of Data Mining based on Lattice is an area of active and promising research in various fields. Therefore, it is important for us to study the Data Mining Method based on Lattice.
This paper focuses on a domain-driven data mining outsourcing scenario whereby a data owner publishes data to an application service provider who returns mining results. To ensure data privacy against an un-trusted party, anonymization, a widely used technique capable of preserving true attribute values and supporting various data mining algorithms is required. Several issues emerge when anonymization is applied in a real world outsourcing scenario. The majority of methods have focused on the traditional data mining paradigm, therefore they do not implement domain knowledge nor optimize data for domain-driven usage. Furthermore, existing techniques are mostly non-interactive in nature, providing little control to users while assuming their natural capability of producing Domain Generalization Hierarchies (DGH). Moreover, previous utility metrics have not considered attribute correlations during generalization. To successfully obtain optimal data privacy and actionable patterns in a real world setting, these concerns need to be addressed. This paper proposes an anonymization framework for aiding users in a domain-driven data mining outsourcing scenario. The framework involves several components designed to anonymize data while preserving meaningful or actionable patterns that can be discovered after mining. In contrast with existing works for traditional data-mining, this framework integrates domain ontology knowledge during DGH creation to retain value meanings after anonymization. In addition, users can implement constraints based on their mining tasks thereby controlling how data generalization is performed. Finally, attribute correlations are calculated to ensure preservation of important features. Preliminary experiments show that an ontology-based DGH manages to preserve semantic meaning after attribute generalization. Also, using Chi-Square as a correlation measure can possibly improve attribute selection before generalization.
By enabling a direct comparison of different security solutions with respect to their relative effectiveness, a network security metric may provide quantifiable evidences to assist securitypractitioners in securing computer networks. However, research on security metrics has been hindered by difficulties in handling zero-day attacks exploiting unknown vulnerabilities. In fact, the security risk of unknown vulnerabilities has been considered as something unmeasurable due to the less predictable nature of software flaws. This causes a major difficulty to security metrics, because a more secure configuration would be of little value if it were equally susceptible to zero-day attacks. In this paper, we propose a novel security metric, k-zero day safety, to address this issue. Instead of attempting to rank unknown vulnerabilities, our metric counts how many such vulnerabilities would be required for compromising network assets; a larger count implies more security because the likelihood of having more unknown vulnerabilities available, applicable, and exploitable all at the same time will be significantly lower. We formally define the metric, analyze the complexity of computing the metric, devise heuristic algorithms for intractable cases, and finally demonstrate through case studies that applying the metric to existing network security practices may generate actionable knowledge.
In order to improve networks' total security, a method of assessing network security risks based on vulnerability correlation graph is proposed in this paper. Firstly, it proposed a definition of vulnerability correlation graph based on the basis of network security dependency. Secondly, according to the size of network topology, the method of assessing the potential risk based on the vulnerability correlation graph is explained in detail. The experiment results show that it's possible to calculate potential risk indexes of three hierarchies: hosts, subnets and networks so that system administrators could adjust the security strategies in order to reduce the potential risk value of the whole network. It is also possible to solve the problem of network state explosion, thus improving expansibility of the assessment method.
The rapid development and wide application of computer networks presents a new challenge to information security and network security. Traditional security models and single security technology cannot keep up with the change of complicated network structure and varied intrusion measures.Network security management based on policy has traits of low management cost, high agility and wide applicability. The mobile agent not only collects but also processes data, overcomes traditional agent's shortcomings, improves response and relieves network burden. This paper introduces network securitymanagement based on policy and a mobile agent into a new network security framework, and emphases its structure, control strategy and implementation
Network security situational awareness (NSSA) technology as a new research area in network securityplays an important role in changing network security defense model from passive type to active. In order to enhance the capability of perceiving status the network stays in and emergency responding capability, it is especially important to design an index system for large scale NSSA and a situation assessment algorithm. This paper extends the existing hierarchical evaluation method and introduces a new network security assessment algorithm with a high adaptability based on other researchers' work.
Along with the extensive application of the network, network security has received increasing attention recently.This paper researches on the network security risk evaluation and analyze the traditional risk evaluation methods, then proposes a new network security risk evaluation method based on Support Vector Machine (SVM) and Binary tree. Unlike the traditional risk evaluation methods, SVM is a novel type of learning machine technique which developed on structural risk minimization principle.SVM has many advantages in solving small sample size, nonlinear and high dimensional pattern recognition problem.The principles of SVM and binary tree are introduced in detail and apply it into network securityrisk assessment, it divided risk rate of network security into 4 different rates and more .Compare to ANN about the Classification precision, Generalization Performance, learning and testing time, it indicates that SVM has higher Classification precision, better generalization Performance and less learning and testing time, especially get a better assessment performance under small samples. It indicates that SVM has absolute superiority on network security risk evaluation, the validity and superiority of this method is approved through the experiment.
A network providing Voice over Internet Protocol (VoIP) service requires many network elements. Eachnetwork element may have its own set of security capabilities, but not all security capabilities on allnetwork elements are necessary at the same time for a given network configuration. An end-to-endnetwork view is necessary to choose appropriate security capabilities while minimizing networkoverhead. For VoIP, using an IP Multimedia Subsystem (IMS) core network and wireless fidelity (Wi-Fi∗) access, the service provider can offer the feature functionality of the core network to both enterprise and residential customers simultaneously. However, both market segments provide their own set of uniquesecurity challenges, and what is appropriate for one market segment is not necessarily appropriate for the other. This paper explores various security implications for both of these market segments and proposes options for securing each network configuration. Security aspects of the control plane, bearer plane, and management plane are considered. © 2007 Alcatel-Lucent.
Attack and defense is the two important aspects of network security issue. Network security situational assessment has focused on attack roundly. Numbers of network security situational assessment models have been put forward based on attack and threat. Those methods have obtained good results. On the other hand, defense lacks careful consideration. Defense embodies system tolerance. System tolerance contains three important factors: system tolerance to attack, system asset tolerance, and system survivability. The article will introduce a network security situational assessment model based on attack and defense. Consult the existent models of network security situational assessment to summarize a hierarchical model. The hierarchical model based on attack mainly. Then we will use the three factors of system tolerance to mend the model. The new hierarchical model based on attack and defense will be closer to realistic security situation status. The corresponding experiment proves that the improved model considered attack and defense has better effect.
With the development and application of network technology, the issues of network security has become prominent increasingly. Network security risk assessment has become the key process in solve network security. Support Vector Machine(SVM)is one of novel learning machine methods, its advantages are simple structure, strong compatibility, global optimization, least raining time and better generalization. So it has superiority to apply it into network security risk assessment. This paper describes the content and the evaluation indicators of network security risk assessment and the classification of the support vector machine in detail. And then an assessment method of networksecurity risk based on support vector machine is proposed in this paper. Experiment results show that the method Is feasible and effective.
Attack graph increasingly becomes a key technique for network security analysis, however, the prevalent Attacker's Ability Monotonic Assumption (AAMA) constraint for attack graph generation could not make full use of the direction of network attack and the hierarchy of defence. As a result, using AAMA is not efficient enough in the process of attack graph generation, especially for large-scale complicated network. With the aim of improving the efficiency of attack graph generation and reducing attack graph's complexity, we proposed the concept of Network Security Gradient (NSG) to reflect the hierarchy of network defence, and the Gradient Attack Assumption (GAA) based on NSG to constraint the process of attack graph generation. To make our theory of NSG more sound and reasonable, we proposed two NSG marking algorithms, respectively from static analysis of network topology and dynamic analysis of network access flow, to rank network nodes automatically. Experiment results showed that both of the two algorithms can mark NSG for network correctly and rationally.
With increasingly more businesses engaging in offshore outsourcing, organisations need to be made aware of the global differences in network security, before entrusting a nation with sensitive information. In July 2011, Syn and Nackrst1 explored this topic by analysing seven countries from a wide spectrum across the globe for network security vulnerabilities. The countries selected were China, the United Kingdom, Germany, Russia, India, Mexico and Romania. Their method utilises Nmap and Nessus to probe and test for network vulnerabilities from each respective nation, in order to collect quantitative data for national vulnerability volumes. The Vulnerability statistics collected are of four categories, High, Medium, Low and Open Ports. This paper extends Syn and Nackrst1's work by constructing a more detailed analysis of their results, showing the number of real-world vulnerabilities per nation, the differences between national levels of network security, the ratios of vulnerabilities/IP address, and vulnerability summary rankings. Multiple causal factors are also looked at to quantify the reasoning behind the varying levels of vulnerabilities per nation. This paper concludes that each nation has millions of vulnerabilities of varying amounts, and therefore, each nation differs in network security levels. Mexico and India exhibited the most worrying statistics, with the highest number of high level vulnerabilities/IP address ratio. Ultimately, this paper highlights the vulnerability levels that organisations are faced with when engaging in foreign and domestic outsourcing.
computing is rapidly changing the face of the Internet service infrastructure, enabling even small organizations to quickly build Web and mobile applications for millions of users by taking advantage of the scale and flexibility of shared physical infrastructures provided by cloud computing. In this scenario, multiple tenants save their data and applications in shared data centers, blurring the network boundaries between each tenant in the cloud. In addition, different tenants have different security requirements, while different security policies are necessary for different tenants. Network virtualization is used to meet a diverse set of tenant-specific requirements with the underlying physical network, enabling multi-tenant datacenters to automatically address a large and diverse set of tenants requirements. In this paper, we propose the system implementation of vCNSMS, a collaborative network security prototype system used in a multi-tenant data center. We demonstrate vCNSMS with a centralized collaborative scheme and deep packet inspection with an open source UTM system. A security level based protection policy is proposed for simplifying the security rule management for vCNSMS. Different security levels have different packet inspection schemes and are enforced with different security plugins. A smart packet verdict scheme is also integrated into vCNSMS for intelligence flow processing to protect from possible network attacks inside a data center network.
With the quick development of computer network, the network scale in school campus has been keeping on expanding. Therefore, security problems of network become a focus of research, a well-operated network management becomes the key issue for a normal and effective school campusnetwork This article mainly analyzes the problems and solutions of the campus E-government networksecurity, studies the security issue and its root in school campus network and mainly probes into the key technique of the school campus network security management system, then we provides thesecurity strategy of network According to the practical situation in our campus, we designs and realizes the school campus security management system.
Many data encryption techniques have been employed to ensure both personal data security andnetwork security. But few have been successful in merging both under one roof. The block cipher techniques commonly used for personal security such as DES and AES run multiple passes over each block making them ineffective for real time data transfer. Also, ciphers for network security such as Diffie-Hellman and RSA require large number of bits. This paper suggests a simple block cipher scheme to effectively reduce both time and space complexities and still provide adequate security for both security domains. The proposed Reverse Circle Cipher uses `circular substitution' and `reversal transposition' to exploit the benefits of both confusion and diffusion. This scheme uses an arbitrarily variable key length which may even be equal to the length of the plaintext or as small as a few bits coupled with an arbitrary reversal factor. This method of encryption can be utilized within stand alone systems for personal data security or even streamed into real time packet transfer for network security. This paper also analyses the effectiveness of the algorithm with respect to the size of the plaintext and frequency distribution within the ciphertext
Malicious softwares or malwares for short have become a major security threat. While originating in criminal behavior, their impact are also influenced by the decisions of legitimate end users. Getting agents in the Internet, and in networks in general, to invest in and deploy security features and protocols is a challenge, in particular because of economic reasons arising from the presence of networkexternalities. An unexplored direction of this challenge consists in under- standing how to align the incentives of the agents of a large network towards a better security. This paper addresses this new line of research. We start with an economic model for a single agent, that determines the optimal amount to invest in protection. The model takes into account the vulnerability of the agent to a security breach and the potential loss if a security breach occurs. We derive conditions on the quality of the protection to ensure that the optimal amount spent on security is an increasing function of the agent's vulnerability and potential loss. We also show that for a large class of risks, only a small fraction of the expected loss should be invested. Building on these results, we study a network of interconnected agents subject to epidemic risks. We derive conditions to ensure that the incentives of all agents are aligned towards a better security. When agents are strategic, we show that security investments are always socially inefficient due to the network externalities. Moreover if our conditions are not satisfied, incentives can be aligned towards a lower security leading to an equilibrium with a very high price of anarchy..
Summary form only given. The importance of network security has been significantly increasing in the past few years. However, the increasing complexity of managing security polices particularly in enterprise networks poses real challenge for efficient security solutions. Network security perimeters such as Firewalls, IPSec gateways, intrusion detection and prevention systems operate based on locally configured policies. Yet these policies are not necessarily autonomous and might interact between each other to construct a global network security policy. Due to manual, distributed and uncoordinated configuration of security polices, rules conflicts and policy inconsistency are created, causing serious network security vulnerabilities. In addition, enterprise networks continuously grow in size and complexity, which makes policy modification, inspection and evaluation nightmare. Addressing these issues is a key requirement for obtaining provable security and seamless policy configuration. In addition, with growth in network speed and size, the need to optimize the security policy to cope with the traffic rate and attacks is significantly increasing. The constant evolution of policy syntax and semantics make the functional testing of these devices for vulnerability penetration is a difficult task. This tutorial is divided into three parts. In the first part, we will present techniques to automatically verify and correct firewall and IPSec/VPN polices in large-scale enterprise networks. In the second part, we will discuss techniques to enhance and optimize the policy structure and rule ordering in order to reduce packet matching and improve significantly firewall and IPSec performance. In the third part, we will present techniques that can be used by users, service provider as well as vendors to test their security devices efficiently and accurately
Traditional network security assessment technologies are usually qualitative analyses from large variation of security factors. It is difficult to guide security managers to configure network securitymechanisms. A new network security quantitative analysis method called ACRL is presented in this paper. It assesses attack sequences from credibility, risk and the loss of system and provides the assessment values to security managers. It can assess the network security mechanisms and measures in position and can help security managers adjust the corresponding security mechanisms and choose the response methods against attacks in detail. An experiment of our method shows favorable and promising results.
To solve the security issues of network resources in file sharing, a reputation evaluation system was build based on resource trust degree. A set of reputation evaluation formula and method was developed to build trust model using the reputation evaluation method based on resource credibility and recommended reliability, and the reputation evaluation system was applied to network security. Through the trust mechanism, users can get the historical experiences of target nodes, thus the networkresource security and the integrity of download resources are ensured. Users can use the model to select safer network resource service objects, and an incentive mechanism can play a part in selectingnetwork resources.
The proposal of network security situational awareness (NSSA) research means a great breakthrough and an innovation to the traditional network security technologies, and it has become a new hot research topic in network security field. First the current research status in this field is introduced, after a summarization of the former achievements, a layered NSSA realization model is constructed, in which extraction of the situational factors is pointed out as the most basic and important step in realizing NSSA. Situational factors(SF) are defined here, and the extraction method of SF is the main research topic in this paper. Combined with evolutionary strategy and neural network, an extraction method of situational factors is proposed. Evolutionary strategy is used to optimize the parameters of neural network, and then the evolutionary neural network model is established to extract the SFs, so the foundation of realizing network security situation is established. Finally, simulation experiments are done to validate that the evolutionary neural network model can effectively extract situational factors and the model has better generalization ability, which will accelerate the realization of NSSA greatly.
Network Security Appliances are deployed at the vantage point of the Internet to detect security events and prevent attacks. However, these appliances are not so effective when it comes to distributed attacks such as DDoS. This paper presents a design and implementation of collaborative networksecurity management system (CNSMS), which organize the NetSecu nodes into a hybrid P2P and hierarchy architecture to share the security knowledge. NetSecu nodes are organized into a hierarchy architecture so they could realize different management or security functions. In each level, nodes formed a P2P networks for higher efficiency. To guarantee identity trustworthy and information exchange secure, PKI infrastructure is deployed in CNSMS. Finally experiments are conducted to test the computing and communication cost.
Forecast of network security situation is an important part of network security situational assessment. The concept of network security situational assessment will be introduced. Then a model of networksecurity situational assessment is summarized based on the concept. On the other hand, The method of GM(1,1) modified by residual error is mentioned. And this method will be improved into optimal fuzzy grey model. The model of optimal fuzzy grey will be used to forecast value of network securitysituational assessment. The results show better effects in simulation test, compared to the original model.
In order to improve the reliability of network security evaluation, the application of analytical hierarchy process method on the network security assessment was studied in depth. Firstly, the characters ofnetwork security were introduced. Secondly, the basic theory of analytic hierarchy process was analyzed, and then the index system of integrated evaluation of network security was established. Thirdly, the analysis procession of AHP was carried out. And finally the evaluation was carried out, and the results showed that this method had high precious and would applied in network securityassessment effectively.
Computer security is a hot topic, more and more Internet users are concerning about computersecurity. Linux is a operating system like Unix. It has all the features of Unix operating system. Linux system has a very strict structure like Unix in security. This paper discusses the network security of Linux system from the technical aspects of network security. The paper introduces methods of protecting network security which set the right to accessing the customer machine and use firewall technology. Describe detailed the Netfilter`s structure and Netfilter`s position in network.
The exponential growth in wireless network faults, vulnerabilities, and attacks make the wireless local area network (WLAN) security management a challenging research area. Newer network cards implemented more security measures according to the IEEE recommendations ; but the wirelessnetwork is still vulnerable to denial of service attacks or to other traditional attacks due to existing wide deployment of network cards with well-known security vulnerabilities. The effectiveness of a wireless intrusion detection system (WIDS) relies on updating its security rules; many current WIDSs use staticsecurity rule settings based on expert knowledge. However, updating those security rules can be time-consuming and expensive. In this paper, we present a novel approach based on multi-channel monitoring and anomaly analysis of station localization, packet analysis, and state tracking to detect wireless attacks; we use adaptive machine learning and genetic search to dynamically set optimal anomaly thresholds and select the proper set of features necessary to efficiently detect networkattacks. We present a self-protection system that has the following salient features: monitor the wireless network, generate network features, track wireless network state machine violations, generate wireless flow keys (WFK), and use the dynamically updated anomaly and misuse rules to detect complex known and unknown wireless attacks. To quantify the attack impact, we use the abnormality distance from the trained norm and multivariate analysis to correlate multiple selected features contributing to the final decision. We validate our wireless self protection system (WSPS) approach by experimenting with more than 20 different types of wireless attacks. Our experimental results show that the WSPS approach can protect from wireless network attacks with a false positive rate of 0.1209% and more than 99% detection rate.
Bring Your Own Device (BYOD) security is the way to protect organization's network against Variety of threats which come through mobile devices and access channels. This research paper explains the implementation of BYOD security solution in higher education institution in Oman. This security solution will help to protect the network data from unauthorized access, as well as, controlling unmanaged devices which are smartphones and mobile devices. This research will follow these steps starting with literature review, data collection, analysis, design the network structure with suggested solution and implementation for BYOD security solutions. As well as, monitoring the network performance with the implanted solutions to keep track if traffic flow with high availability and security. This research paper will help to facilitate the work to the network users through allowing BYOD as well as increase the networkavailability, ability and security through 802.1x, CA and RADius.