On a approach for intellectual analysis of registration data of domain names

Business & Finance

8 pages

Please download to get full document.

View again

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
On a approach for intellectual analysis of registration data of domain names
  On a approach for intellectual analysis of registration data of domain names Rasim Alguliev 1 , Rena Gasimova 1 ,   1  Azerbaijan National Academy of Sciences, Institute of Information Technology 9. F.Agayev str., Baku, Azerbaijan Аz1141  {Rasim Alguliev, Rena Gasimova,  depart1@iit.ab.az   Abstract.  The paper is devoted to formation of knowledge base of the domain names system, serving to interests of the Azerbaijan Republic. According to this purpose, the data warehouse is developed for processing great volume of registration data of domain names, clustering data is performed, and rules are generated for extraction of new knowledge. Keywords: domain, domain name system, registrar, registrant, clustering, categorical data. 1 Introduction As a global telecommunications network of information and calculation resources, Internet creates a global information space, serves as a physical basis for World Wide Web and a variety of systems (protocols) of data transfer. Domain name system is used for addressing of requests in Internet. Domain is a spatial region of domain names and is characterized by independence of subdomain allocation, inclusion of information systems in domain structure, availability of special information systems (DNS-servers) containing data on domain names, allocated in the domain, and executes the function of organization of domain name space [1]. Domain name is an identifier of a domain and (or) information system, possesses a unique structure conditioned as: limited set of symbols, a name that identifies the domain, which contains domain name and as a voluntary part domain label or host name unique within the limits of the domain, which is contained in the domain name. Domain name carries out the functions of identification, individualization and addressing [2]. Analyses demonstrate that, domain names are also used as means for conduction of unfit and unethical competition. One of the examples of unethical use of Internet, is use of famous trademarks, service trademarks, place of srcins of commodities, as well as brand names in domain names [3].  2 Relevance Currently, deficiencies existing in the field of domain name registration, absence of transparency in regards to domain name registration process, violation of domain name registration rules by the registrars, purchase and use of domain name (in irresponsible manner) with the purpose of its further sale, non-existence of a single  policy against invaders of domain names (cyber squatters, phishing etc), as well as software, which allow to conduct an accurate analysis of registration data about domain names collected in DNS (Domain Name System). Listed problems make conduction of scientific analysis of domain name registration data collected in DNS servers necessary. Considering the dynamics of increasing of number of domain names, it is possible to conduct processing and analysis of this information collected from thousands of domain names, obtain new knowledge, detect regularities and make necessary decisions. As intellectual analysis of registration data can be the solutions reason of such issues as forecasting, making operative, effective and analytical decisions in domain field, definition of facts, evaluation of the real condition of domain market, research domain name monitoring problems etc. 3 Objective The objective of the article is development of data storage for processing of a large volume of domain name registration data, conduct clustering of this data and generates rules for gaining new knowledge. 4 Tasks Following tasks are formulated in research purpose: 1) Processing of domain name registration data in storage, clustering of this data using CLOPE algorithm; 2) Generation of rules using Magnum Opus v.5.4.1. Program for each cluster and decision making. 5 Problem Solution Domain registration information include: domain name, registrar, name, address, admin-o, admin-c, organization, created, updated, free-date, phone, e-mail, nserver, type, source, paid-till etc [4] (fig. 1).    Fig. 1.  Domain registration data example As domain registration data mainly consist of categorical data (which cannot be regulated in space), application of traditional algorithms for objects’ clustering is ineffective. Clustering  –   is a fundamental data analysis and Data Mining task that groups together similar objects. On modern level, clustering is frequently used as the first step at data analysis [5, 6]. High dimensionality (thousand fields) and large volume (hundred thousand and millions of records) of data base tables, complexity of metrics definition for calculation of distance among categorical data, very low productivity at pair-wise comparison of distance between points (k-means) at each iteration procedure on large record arrays, and sometimes even inapplicability require application of scaled algorithms of categorical data clustering. In most algorithms, metrics based on Euclidian distance concept is used as objects’ proximity metrics          nk k k   y x y xd  12 , , where   n  x x x x  ,,, 21    ,   n  y y y y  ,,, 21    , thresholds  –       are given for cluster setting, if         y xd   , , then ci A y x i  ,1,,   . But given metrics is not always effective, as it obliges clusters to have a spherical form that is not inherent to them. Consequently, known k-means clustering algorithms cannot achieve a satisfying result. Currently, a variety of clustering algorithms were proposed for working with categorical data. But, they do not always meet abovementioned requirements. LargeItem is considered as one of the effective algorithms, which is based on optimization of some global criteria. CLOPE algorithm, proposed in 2002 by a group of Chinese scientists, allows clustering task solution of not only categorical, but also any transactional data [7]. It  provides a higher productivity and better quality of clustering in comparison with LargeItem algorithms and many other hierarchical algorithms. The key is that all features of objects are measured in nominal scale. However, before launching CLOPE, data must be brought to normalized form. It can have a form of a binary matrix, as in associative rules, as well as being a biunivocal mapping between a set of unique objects of the table and a set of whole numbers. CLOPE is easily counted and interpreted. During its operation, algorithm saves a small amount of information on each cluster in RAM and requires a minimal number of data set scanning. This allows its application for clustering of huge volumes of categorical data. CLOPE automatically selects a quantity of clusters; moreover it is regulated by a single  parameter  –   repulsion coefficient [8]. Thus in reviewed case, CLOPE is one of the effective algorithms, based on which is the idea of maximization of global criteria  –   cost function Profit (C), which increases the proximity of transactions in clusters through increasing the cluster histogram parameter (fig. 2).    Fig. 2.  Domain Registration Data Clustering Formula for calculation of global criteria  –   cost function looks like following: for given transaction base   1 2 , ,... n  D t t t    and r   is the repulsion coefficient, to find such a splitting of   1 2 , ,..., k  C C C C    that   , max  Profit C r     ( 1 ) Where r   regulates the level of similarity of transactions within the cluster  –   bigger is r  , larger is the final number of clusters. Cost function formula:     11 , k iir iik ii S C C W C  Profit C r C      ( 2 )
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks