In this article we are going to discuss about how genetic programming can be used for record deduplication. Several systems that rely on the integrity of the data. GP-based approach we proposed to record deduplication by performing a comprehensive Keywords: Genetic Programming, DBMS, Duplication, Optimisation. Request PDF on ResearchGate | A Genetic Programming Approach to Record Deduplication | Several systems that rely on consistent data to.

Author: Dahn Tojazil
Country: Canada
Language: English (Spanish)
Genre: Art
Published (Last): 15 August 2018
Pages: 494
PDF File Size: 16.29 Mb
ePub File Size: 19.22 Mb
ISBN: 774-1-35523-502-1
Downloads: 76537
Price: Free* [*Free Regsitration Required]
Uploader: Mile

Home Archives Vol 2 No 06 Skip to search form Skip to main content. The approach joins several different pieces of attribute with similarity function extracted from the data content approach produce a deduplication function that is able to identify whether two or more entries in a repository are replicas or not.

From This Paper Topics from this paper.

Citations Publications citing this paper. Personalization Display resolution Bridging networking Cleaning activity.

A Genetic Programming Approach for Record Deduplication

International Journal of Engineering and Computer Science2 A Survey Ahmed K. By clicking accept or continuing to deduplifation the site, you agree to the terms outlined in our Deduplicatioh PolicyTerms of Serviceand Dataset License. Chitra Devi and S. Starting from the non duplicate reocord set, the two different classifiers, a Weighted Component Similarity Summing Classifier WCSS is used to knowing the duplicate records from the non duplicate record and presently a genetic programming GP approach to record deduplication.


Effective method E-commerce Time complexity Data computing. References Publications referenced by this paper. Since record deduplication is a time taking task even for small repositories, the programmung is to foster a method that finds a proper combination of the proper pieces of attribute with similarity function, thus yielding a deduplication function that maximizes performance using a small representative portion of the corresponding data for training purposes.

The aim behind is to create a flexible and effective method that uses Data Mining algorithms. Record deduplication[1] is the task of identifying, in a data storage, records that refer to the same real entity or any object in spite of spelling mistakes, typing errors, different writing styles or even different schema representations or data types. Quick jump to page content. The proposed system has to develop new method, modified bat algorithm for record duplication. Is you data dirty?

Genetic programming Programning deduplication Repository Digital library. Improving efficiency and reducing capacity requirements. Several systems that rely on the integrity of the data in order to offer high quality services, such as digital libraries and ecommerce brokers, may be affected by the existence of duplicates, quasi-replicas, or near-duplicates entries in their repositories.


503 Service Temporarily Unavailable

IpeirotisVassilios S. ElmagarmidPanagiotis G. Topics Discussed in This Paper. Downloads Download data is not yet available. Suresh Babu Published In this article we are going to discuss about how genetic programming can be used for record deduplication.

An analysis of the behavior of a class of genetic adaptive systems. Chitra DeviS. UDD, which for a given query, can effectively identify duplicates from the query result records of different web databases. But the optimization of result is less.

Moises G. de Carvalho – Google Scholar Citations

Showing of 18 references. Vol 2 No 06 Page No.: In the existing system aims at providing Unsupervised Duplication Detection method which can be used to identify and remove the duplicate records from different data storge. The system shares many similarities function with deduplivation computation techniques such as Genetic programming approach.