Data Cleaning: Paving a Way for Accurate and Clean Data

  • Laxmi Ahuja Dy. Director, AIIT, Amity University, Noida
  • Bhoomika Singh AIIT, Amity University, Noida
  • Rajbala Simon Addl. Superintendent examination, AIIT, Amity University, Noida
Keywords: ETL (Extract Transform Load) | NTLK (Natural Language Toolkit) | NLP (Natural Language Processing) | RDBMS (Relational Database Management System) | OLTP (Online Transaction Processing System) | ML (Machine Learning)

Abstract

Purpose: Data cleaning plays one of the most important roles to ensure the quality and reliability of data that has been used for various purposes such as data analysis, artificial intelligence, decision making, etc. With the ever-increasing amount of data in this digital age, it becomes very significant to address the problem of data inconsistency, duplication, incompleteness and inadequacy.

Design/Methodology/Approach: With the help of various other research papers available online, different point of views regarding the data cleaning and various datasets available as a result of data cleaning using various techniques.

Findings: The research paper first discusses data cleansing, its steps and the significance of data cleansing in various fields. It also specifies key dimensions of data quality such as completeness, correctness, consistency, accuracy and uniqueness. The paper also covers various data cleaning techniques including ETL and text mining techniques such as NTLK and NLP techniques. Additionally, this paper covers the various challenges associated with data cleansing in RDBMS. It explores emerging trends and various advances in data cleansing during OLTP. The conclusion of the study emphasizes the need for a systematic approach to data cleaning and the importance of evaluating and proposing data cleaning. And Major technological improvement in this area.

Originality/Value: This paper will help us understand the current technologies and further advancements that can be made in the data cleaning field.

Paper Type: Theme Based Paper.

Published
2024-06-25
How to Cite
Ahuja, L., Singh, B., & Simon, R. (2024). Data Cleaning: Paving a Way for Accurate and Clean Data. Global Journal of Enterprise Information System, 16(1), 18-25. Retrieved from https://gjeis.com/index.php/GJEIS/article/view/758
Section
Theme Based Papers (TBP)
Share |