Data Cleaning: Paving a Way for Accurate and Clean Data
Abstract
Purpose: Data cleaning plays one of the most important roles to ensure the quality and reliability of data that has been used for various purposes such as data analysis, artificial intelligence, decision making, etc. With the ever-increasing amount of data in this digital age, it becomes very significant to address the problem of data inconsistency, duplication, incompleteness and inadequacy.
Design/Methodology/Approach: With the help of various other research papers available online, different point of views regarding the data cleaning and various datasets available as a result of data cleaning using various techniques.
Findings: The research paper first discusses data cleansing, its steps and the significance of data cleansing in various fields. It also specifies key dimensions of data quality such as completeness, correctness, consistency, accuracy and uniqueness. The paper also covers various data cleaning techniques including ETL and text mining techniques such as NTLK and NLP techniques. Additionally, this paper covers the various challenges associated with data cleansing in RDBMS. It explores emerging trends and various advances in data cleansing during OLTP. The conclusion of the study emphasizes the need for a systematic approach to data cleaning and the importance of evaluating and proposing data cleaning. And Major technological improvement in this area.
Originality/Value: This paper will help us understand the current technologies and further advancements that can be made in the data cleaning field.
Paper Type: Theme Based Paper.
Copyright (c) 2024 Global Journal of Enterprise Information System
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.