To Investigate Data Mining Challenges And Potential Solutions
Question
Task: It is your responsibility to ensure that your assignment arrives before the submission deadline stated above. See the University policy on late submission of work.
For the KF7028 assignment, you must produce a proposal for a short research project. The proposal is the foundation of the 60-credit project that you will study as part of your postgraduate programme, KF7029 MSc Computer Science & Digital Technologies Project.
This assignment is therefore doubly important:
- It accounts for 80% of the marks for KF7028
- It is essential preparation for your research project.
As stated, the KFG7029 research project will account for 60 credits (one-third) of your programme of study. Typically, you will spend 600 hours on this project.
Your KF7028 research proposal will be approximately 2500 – 3000 words and specify the project background, motivation and relevance to your programme of study; its scope, aims and objectives; a plan of the major activities. Your proposal should draw on current and recent research and other appropriate sources of information and cite these sources using a consistent standard referencing system. The proposal will be assessed by your allocated module tutors for the module or by your supervisor and a second marker, feedback will be provided, consisting of suggestions and comments on how to improve the project.
You will be allocated your project supervisor in semester two of your study period. If you start your study in September, you will be allocated a project supervisor in second semester January – May. If you start your study in January, you will be allocated your supervisor in your second semester September – January
This proposal should be based on the outcome of your own research and of the meetings with your tutors, peers and supervisors. This supervisor/tutors will be confirmed with you by the end of week 6. By this time you should have produced a draft copy of your proposal. Your supervisor/tutor will provide individual support and formative feedback during the preparation of your research proposal.
Research question
You should aim to outline a research question. This research question defines what you want to study and is the first step in development of your project proposal.
Aim
Produce a summary of what your project will achieve in one or two sentences.
Background, Motivation and Relevance – literature review
- Set out and justify, using supporting literature, where your work will fit in the computing body of knowledge.
- Outline and critically discuss what relevant research has been undertaken, in the past, in your area of interest and why this project is necessary.
- Outline and critically discuss why the research will be of value, in the future, and what the anticipated impact of the outcomes will be.
- Provide a diagram such as a mind map illustrating the position in the computing body of knowledge.
- Outline why you want to undertake this work, some areas to consider are, personal motivation, skills set and career choices, a gap in the knowledge base.
Scope, Objectives and Risk
- Outline and justify the scope of your project – what you intend to do, you may also want to say what you will not include but might be expected by others to be incorporated.
- Outline numbered steps (SMART Objectives) you will have to complete in order to achieve the aims and would provide sufficient work to conclude your research question.
After each objective outline the deliverables and or outcomes that will be produced as a result of this objective’s work. Outline how you will assess and or measure the quality for that objective. Outline what research method/design you will adopt for each objective and how each method will support one another.
- Break down the objectives into tasks and produce a task list, each task should outline, deliverable, resources (all types) required, skills required, time required.
- Produce a risk log (see example on your Blackboard Assessment tab) for you project. Each objective should be assessed based on risks, such as technical, personal, resources, time, and costs etc factors.
Sources and use of Knowledge
- Identify and justify a journal where your work would be relevant for publication.
- Identify and apply aspects of the standards for publication in the target journal that you can incorporate into your project proposal, so for example the referencing system, font style and pitch etc, instructions usually found under the journal’s advice to authors.
- Identify, justify and critically discuss any authors, journals where previous relevant research has been published within the last 5 years.
- Ethics, Legal, Social, Security and Professional Issues
Using literature, write out a separate section for Ethics, Legal, Social, Security and Professional Issues. In your deliberations discuss, explore and define all of the issues associated with your project, including how you will consider security issues. If you think an area of this section is not applicable to your project, you should justify why this is the case; leaving any section blank will result in a loss of marks.
Answer
Abstract
The document exposes the proposal that develops analysis and investigation of data mining challenges and finding potential solutions. The proposal takes into account the major ways to focus on data mining challenges, literature review, relevance, sources and use of knowledge, scope, risk, ethics and issues. At the end, the schedule of activities such as task list, work break down structure and Gantt chart is prepared. The overall analysis is to investigate and analyze data mining challenges and presenting mind map in detail. Data mining is the technique of practicing large database to generate new information with an insightful manner. The key features of data mining are to predict pattern that is based on likely outcomes with the help of decision making process. Data mining is a process of extension that provides valuable data with useful and bulk information.
1 Aim
The project aims to critically analyze and evaluate the problems of data mining and investigates potential solutions. The study of data mining challenges and potential solutions are addressed while considering tools and techniques for useful analysis and evaluation. What are the types of data mining challenges and solutions taken by organizations?
2 Background, Literature review and Motivation
2.1 Background
An enormous amount of data is collected every minute. Data mining is a process of analyzing data and summarizing information that is effective in all the categories and is identified effectively. Data mining is a process of considering patterns of data and ensuring large relational databases. Data mining is a term that ensures technology used to filter the data and store it in an appropriate manner (Hassani, Huang & Silva,2019). The data mining challenges and solutions are provided in detail for continuous innovations and disk storage in a dramatic manner while emphasizing driving costs and timelines. The project is worthwhile as will help to ensure data mining challenges. The project includes literature review, motivation, scope, objectives, risk of undertaking the project, and ethical consideration. Gantt chart is prepared whereas monitor and control table is conducted. Secondary resources will be used in the dissertation to achieve project objectives. One relevant journal that is highlighted describing the data mining challenges and solutions is Critical analysis of Big Data challenges and analytical methods from Journal of Business Research.
A mind map is created to ensure the activities that will engage data mining challenges and solutions for data mining. There are branches of data mining such as data modeling, logistic, data handling, regression, visualization, clustering, support and decision tree.
2.2 Literature review
Big data analysis is an advanced process of analysis and visualizing methods to focus on extensive data and ensure hidden patterns for effective decision-making techniques. Big data mining has multiple phases that include data acquisition, recording, extraction, integration, and data modeling (Sun et al., 2017). Some data mining challenges are attached within the phases to which effective solutions should be addressed by organizations (Ma et al., 2017). Data mining is a process of extracting knowledge from data stored within data sources such as warehouses, databases, and file systems. There are data mining challenges and benefits in the process to which secured and safe techniques are addressed to consider efficient data analysis. Large data sets that ensure knowledge and reveal patterns of data are called as Big Data (Wu, Guo, Li & Zeng 2016). The high volume of data is presented within the structured and unstructured form to gain an overall benefit of data mining. The motto is to gather, process, and analyze data with valuable results to which big data analysis is useful, and better decisions are taken with strategic moves. In a recent study, Google receives more than 4 million queries, 200 million messages, and Facebook users share more than 2 million content (Zhu, Cui, Wang & Hua 2015). The amount of data is growing minute by minute that best matches the user's need and focuses on traditional data-based tools to be eliminated due to several data mining challenges. Supervised method will be used to focus on literature review.
Data mining challenges and Issues: The significant problems in data mining are heterogeneity, complexity, and timeliness. There is difficulty in managing the data while there are data mining challenges such as heterogeneity, complexity, timeliness, scale, and privacy of big data mining. Big data analytics is challenging to manage as it derives from an extensive data set and ensure a presence of mixed data which is collected (Götz et al., 2015). The data is stored effectively, but there is a heterogeneous mixture to which there are several patterns that come up with properties and varied aspects. There is no particular format to the data that is stored; thereby, it said that around 80% of information is unstructured, which is unmanageable and cannot be used effectively (Traore, Kamsu-Foguem & Tangara 2017). The information that is collected is dynamic and is disorganized, to which it is difficult to be used in several forms. The data exists in email attachments, pdf, documents, and medical records that are difficult to be stored. Data is relatively unstructured to which efforts are made by data analysts to structure the data ad engage in a structured form of data. Transforming information is a significant challenge in big data mining. Some technologies are used in dealing with data mining.
The level of complexity is high and it increases with the volume of data, which is considered a challenge for an organization. As such, there are traditional tools that are not sufficient to manage big data. Data analysis is complicated, and this is one such challenge that is complex and scalable with the data mining structure (Sivarajah, Kamal, Irani & Weerakkody 2017). The size of the data is processed to which it takes time to analyze the data. In several situations, immediate actions are made, and problems are suspected to which effective solutions are provided. Therefore it takes time to give an extensive data set to meet specific criteria. Scanning and data analysis takes time, which is an impractical means to collect and analyze data (Schnase et al., 2016). Therefore advance means of index structures are followed to permit qualifying elements quickly. Data protection is the major challenge to which security threats are the primary data mining challenges. An unauthorized user can access documents and files, an unauthorized user may sniff data sent to the client, an unofficial client may gain private information, etc. are the types of data mining challenges that an organization should focus and enhance techniques to be safe from these data mining challenges (Wu, Zhu, Wu & Ding 2014). The type of methods that are used consist of using authentic methods, encryption, authorization, and audit trails.
2.3 Motivation
Encryption is the method that can be used to protect the data and store the data effectively. This method ensures the privacy and confidentiality of providing useful information and secures sensitive data. Data is protected with the help of encryption methods to which, if malicious users access to data, consistent protection layers are provided with high data security (Guo, Zhang & Zhu 2015). Encryption is a cost-effective way to which several data security threats are addressed at a specific time. Access controls are implemented, and the process of control privileges are ensured to enhance security (Jaseena & David 2014). It is essential to use secure communication and ensure active communication goals. Secure communication between applications and nodes is effective to which SSL implementation protects the network communication and ensure adequate privacy of data in context to Big Data (Ramo, Garcia, Rodriguez & Chuvieco 2018). To detect the level of threat and unusual behavior, it is essential to record the activity and ensure scalable data management platforms that are a natural fit for data collection and management.
3 Sources and use of knowledge
3.1 Journal
Various journals are used on the topic of investigating the data mining challenges and providing potential solutions for the same. The IEEE standards are followed by using journal papers and focusing on surveys and techniques. The journals that have been used are IEEE that provides information related to computer and information technology. The journal covers the basic understanding of effective practices to be considered in information technology to address the data mining challenges. There are published articles that are provided in the research to which new points are discussed, and the results are provided with different techniques.
3.2 Standards
A dissertation will be based on the IEEE standard, taking into account the selected journal. Thus the following characteristics will be useful in considering the standard format of IEEE.
Font: Times New Roman, Size: 8, Tables of content and references, Size: 9. Body article, Size: 10. Size 11: Author name.
Margins: Top: 19mm, Bottom: 25mm, Left-Right: 17 mm.
References: All the recommendations will have a consecutive number starting with one and engage with the square bracket to which the formal will be used to reference the article with a particular list.
3.3 Relevant authors
After creating a mind map about data mining challenges and solutions, the keywords that are used are data mining challenges, solutions for data mining, which appear to be the most relevant aspect in the project process. The project will not be used other authors' information in the research process and new authors' information. The IEEE transactions is a good impact factor that will be used in the project.
4 Scope, objectives, and risk
4.1 Scope and Objectives
The study aims to evaluate and investigate the challenges of data mining and provide potential solutions for managing data mining and data threats in an organization. To achieve the objective, data mining techniques are used and proposed. The objective is to focus on deriving challenges of data mining and engage in effective work solutions. The project is required to include several scopes such as
- To investigate the problems of data mining
- To evaluate the problems of data mining
- To find useful and potential solutions for data mining
4.2 Risk
Risk analysis is considered in the study while focusing on risk event, risk value, risk monitoring and risk management strategy. Risk analysis provides detail of overall project risk that was observed in the project completion.
Risk type |
Risk Event |
Likelihood (1-10) |
Impact (1-10) |
Risk Value(1-100) |
Risk monitoring |
Risk Management Strategy |
Risk review |
T |
The topic is interesting and new for me to understand clearly |
2 |
4 |
50 |
Delay in research |
Reading and analyzing practices |
During research |
T |
The resources are not adequate |
3 |
4 |
50 |
Closure of library |
Manage through internet resources |
During research |
T |
Data implementation and focusing on dataset with images |
4 |
4 |
50 |
Error |
Communicate with colleagues |
During research |
F |
No financial risk |
0 |
0 |
0 |
/ |
/ |
/ |
P |
No risk of human resources |
0 |
0 |
0 |
/ |
/ |
/ |
S |
No security risk |
0 |
0 |
0 |
/ |
/ |
/ |
5 What are the Ethics, legal, social, security and professional data mining challenges and issues?
5.1 Ethics
There is no risk in terms of ethics while considering the project ad focusing on data sets that help to investigate the challenges of data mining and finding potential solutions for data mining. The project provides adequate information for the researchers to consider in the field of research and ensure useful analysis. There will be no problems in finding data confidentiality as the data and research is found in the public domain. The project does not have any impact on a person or any researcher undergoing the investigation of challenges of data mining. The project will not harm any person, nor will it damage the research. The University will control the research behavior, and personal rules will be followed to undertake the project.
5.2 Legal
There are no external organizations involved in the project due to which the information that will be provided will be legitimate and will be managed according to the political and sensitive data mining challenges. The information contained in the project will be analyzed properly and will ensure technical based and standard protocols that are primarily considered by professional organizations. The project will adhere to the University norms and will ensure government laws that state the necessary criteria to be fulfilled with useful analysis. There will be no legal data mining challenges while working on the project, and the databases will be considered with public domains as and when mentioned with rules and regulations. In addition to this, there will be no such participants who will work in the research project. This research will have no vulnerable groups to perform the research activity.
5.3 Social
This project will be carried out with at most consideration of the society. There will be no such data mining challenges for the community to which the research activity is carried on effectively. The project will not engage in human resources from any organization. However, the project will guide the nation and will ensure that no such activities are performed that can harm the nation or community as well.
Furthermore, the research will not undertake with participants or groups. This project will not involve vulnerable groups or information that is harmful to the society and community at large. The data will be managed and analyzed in an appropriate manner to which technical protocols can be used as the standard of knowledge. There will not be any environmental impact to which the researcher will be obliged with terms and conditions. This will be theoretical research that will present information that is relevant and has a practical analysis of data mining information.
5.4 Security
This project will use the dataset to which the public domain will be secured, and no such activities will be performed that will engage human resources to carry out the project. Security issues will not be a significant problem in the project because the project will consider practical means of solutions.
5.5 Professional
There are no such professional data mining challenges that will be considered in the project. The project will not engage people in working and following the rules. Therefore the dataset and the potential solutions for the business will ensure useful research. The dataset will be effectively used in a study that is already published. As a professional, I will ensure that I will develop the project with complete responsibility and commitment. According to the University standards, I will engage honesty and integrity in researching the studies and reaching objectives that are provided in the document. I will follow the professional code of conduct while participating in the research project and providing a useful report.
6 Schedule of activities
6.1 Gantt chart
Gantt chart is prepared to carry on the project with particular task on particular time. The expected time that will be taken to complete the dissertation is 3 months to which all the relevant tasks will be analyzed in detail.
6.2 Monitoring and control
The task list is provided in detail for each objective to be achieved in the project. The kind of deliverable used, monitor and control of each project phase is provided in detail.
Objectives/Milestones |
Duration in days |
Planned start |
Planned end |
Deliverable |
1 Reading literature |
15 |
1-04-2020 |
14-04-2020 |
Written report |
2 Finalize objectives |
15 |
15-04-2020 |
31-04-2020 |
Written report |
3 Draft literature review |
6 |
1-05-2020 |
07-05-2020 |
Written report |
4 Create Research approach |
7 |
08-05-2020 |
14-05-2020 |
Written report |
5 Draft research strategy |
1 |
15-05-2020 |
15-05-2020 |
Written report |
6 Review secondary sources |
15 |
16-05-2020 |
31-05-2020 |
Written report |
7 Analyze data |
1 |
1-06-2020 |
1-06-2020 |
Written report |
8 Pilot test |
7 |
2-06-2020 |
9-06-2020 |
Written report |
9 Enter data |
5 |
10-06-2020 |
15-06-2020 |
Written report |
10 Draft findings |
4 |
15-06-19-06-2020 |
19-06-2020 |
Written report |
11 Update Literature |
10 |
20-06-2020 |
30-06-2020 |
Written report |
6.3 Work breakdown structure
7 References
Götz, M, Richerzhagen, M, Bodenstein, C, Cavallaro, G, Glock, P, Riedel, M, & Benediktsson, JA, 2015, On scalable data mining techniques for earth science. Procedia Computer Science, vol. 51, pp. 2188–2197
Guo, H D, Zhang, L, & Zhu, L W, 2015, Earth observation big data for climate change research, Advanced Climate Change Resource, data mining challenges vol. 6, pp. 108–117.
Hassani, H, Huang, X, & Silva, E, 2019, Big Data and climate change, Big data and cognitive computing, vol. 3, no. 12, pp. 1-17.
Jaseena, K, & David, J, 2014, Issues, challenges and solutions: Big Data mining. Computer Science and Information Technology, pp. 132-142.
Ma, Z, Xie, J, Li, H, Sun, Q, Si, Z, Zhang, J & Guo, J, 2017, The role of data analysis in the development of intelligent energy networks. IEEE Network, vol. 31, pp. 88–95
Ramo, R, Garcia, M, Rodriguez, D, & Chuvieco, E, 2018, A data mining approach for global burned area mapping, International Journal Appl Earth Obs Geoinformation, vol. 73, pp. 39-51.
Sivarajah, U, Kamal, M, Irani, Z, & Weerakkody, V, 2017, Critical analysis of Big Data challenges and analytical methods. Data mining challenges Journal of Business Research, vol. 70, pp. 263-286.
Sun, Q, Miao, C, Duan, Q, Ashouri, H, Sorooshian, S, & Hsu, K L, 2017, A review of global precipitation data sets: Data sources, estimation, and intercomparisons, Review of Geophysics, vol. 56, pp. 79–107.
Schnase, J L, Lee, T J, Mattmann, C A, Lynnes, C S, Cinquini, L, Ramirez, P M, Webster, W P, 2016, Big data challenges in climate science: Improving the next-generation cyberinfrastructure. IEEE Geosci. Remote Sens. Management, vol. 4, pp. 10–22
Traore, B B, Kamsu-Foguem, B, & Tangara, F, 2017, Data mining techniques on satellite images for discovery of risk areas. Expert System Appl, vol. 72, pp. 443–456.
Wu, J, Guo, S, Li, J & Zeng, D, 2016, Big data meet green challenges: Greening big data. IEEE System Journal, vol. 10, pp. 873–887.
Wu, X, Zhu, X, Wu, G Q, & Ding, W, 2014, Data mining with big data. IEEE Transactions on Knowledge and Data engineering, vol. 26, pp. 97–107
Zhu, W, Cui, P, Wang, Z, & Hua, G, 2015, Multimedia big data computing. Data mining challenges IEEE Multimedia, vol. 22, pp. 96-103.