AMAIZING OFFER GET 25% OFF YOUR FIRST ORDER CODE FIRST25
Answer 1In the much significant data record part of the cases, the information collection is in the crude configuration if you examine an informational collection. This means a few pieces of the information focuses aren’t right or basically absent. Raw or original data is the crude information that is estimated and gathered straightforwardly from source, machine, web, appliance, etc. They usually are not in the configuration that is prepared to perform data analysis. There exist specific approaches or data preprocessing steps to assess missing points or eliminate the information that is miss driving the study.There are four main preprocessing steps that need to be performed on data before they undergo further analysis and they are:Data Collection: Data Collection or data consolidation is the process of collecting, accumulating data for performing the analysis. Data collections is done through various process like surveys, emails, legacy systems, mainframes, tools, internet, etc.
Data Cleansing: Generally, the data collected in the data collection is raw or in original form which can be misaligned, contaminated, or may miss some information. So, data cleansing or data cleaning comes into the picture at this stage to clean or purify the data by getting rid of duplicates, filling the gaps.
Data Transformation: Data transformation is the systematic approach of transforming the data from one format/data type to another depending on the analytical procedure. This is one of the most important steps in the data preprocessing process as it can have a huge impact on the analysis.
Data Reduction: Allow us to assume that we have an enormous dataset, and the task is to perform information examination and mining on this information. Since the dataset is enormous, information investigation would be intricate, and mining will require a lot of time, making such an examination illogical and infeasible. In such cases, the Data Reduction procedure can be applied to get a diminished portrayal of the information that is a lot more modest in volume yet intently keeps up the first information’s trustworthiness. When data has been decreased, information examination won’t be perplexing and won’t take a lot of time.
ReferencesLiao, T., & Triantaphyllou, E. (2007). Recent advances in data mining of enterprise data algorithms and applications.Wu, C. (2009). Predicting monthly streamflow using data-driven models coupled with data-preprocessing techniques. Water Resources Research, 45(8), W08432–n/a. https://doi.org/10.1029/2007WR006737—————————————————————————————————————————————-Answer 2Raw or original data isn’t used for analytics due to the fact that raw data is usually misaligned, complex, dirty and inaccurate (Alasadi, & Bhaya, 2017). Both data cleansing and the process is vital to provide data mining algorithms with some clean data. The main issues of utilizing the raw data within the analytics activities include the following. First, when incorrect data is analyzed, it can lead to poor strategic decisions. Second, using big datasets that are uncleaned and unstructured can cause some negative effects instead of advantages to the company. Lastly, the data isn’t static since the data must be cleansed to eliminate duplicated and to correctly structure the data that’s to be utilized in data mining activities (Luengo et al., 2020).Data PreprocessingData acquisition – To prepare data for analytics, the data has to be acquired from different sources and combined to offer a formal kind of a dataset.Importing crucial libraries- Python language is used to perform the data preprocessing. To start the preprocessing process, it’s vital to import the libraries to utilized during the procedure.Importing the dataset – when the dataset is prepared, and libraries are set, it’s time to import the acquired set of data.Identifying and addressing the missing values – during the data preprocessing, it’s important to establish and address the missing values that can contribute to the outcome of data that are being processed (Sharda et al., 2019).Data categorizing – the data that’s being processed must be classified into groups to simplify its usage in future.Splitting datasets – it’s suggested to subdivide the dataset into two sets: test and training sets to isolate the possible issues.Standardizing – it’s important to standardize the independent variables within the processed set of data.Significance of Data Processing in AnalyticsData processing has numerous benefits when applied in analytics. First, it can result in reliable, accurate and better decision since it ensures only the correct data is fed to the analytics tools. Hence, it results in useful insights that guide decision making. Second, reducing cost and easy data storage. Through data processing, data is cleansed before it’s analyzed. Data cleansing removes data duplicates, data inconsistencies, and unwanted data. This reduces the storage size required and thus, reducing the costs incurred to store data and also makes it easy to store the data. Lastly, increasing productivity and profits, data preprocessing helps employees to work with the correct data and to quickly derive actionable data which is used in making informed decisions on marketing, changes in customer needs among others (Eckroth, 2018). This increases their productivity and profits made by an organization.ReferencesAlasadi, S. A., & Bhaya, W. S. (2017). Review of data preprocessing techniques in data mining. Journal of Engineering and Applied Sciences.Eckroth, J. (2018). A course on big data analytics. Journal of parallel and distributed computing.Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2020). Big Data Preprocessing: Enabling Smart Data. Springer Nature.Sharda, R., Delen, D., & Turban, E. (2019). Analytics, data science, & artificial intelligence: Systems for decision support.
Requirements: .doc file