Question 1

What is the difference between structured and unstructured data?

Accepted Answer

Structured data is highly organised and easily searchable in relational databases, such as customer records in a table with rows and columns. Unstructured data lacks a predefined format, like emails, videos, or social media posts. Semi-structured data, like JSON or XML, has tags or markers but doesn't fit neatly into tables. Big Data often involves all three types, requiring flexible storage and processing methods.

Question 2

How does Hadoop process Big Data?

Accepted Answer

Hadoop uses the MapReduce programming model to process large datasets in parallel across a cluster of computers. The 'Map' step splits data into chunks and processes them independently, while the 'Reduce' step aggregates the results. Hadoop also includes HDFS (Hadoop Distributed File System) for storing data across multiple nodes, ensuring fault tolerance by replicating data blocks. This allows efficient processing of petabytes of data.

Question 3

What are the ethical issues with Big Data?

Accepted Answer

Key ethical issues include privacy (e.g., companies collecting personal data without consent), surveillance (e.g., governments monitoring citizens), and bias (e.g., algorithms discriminating against certain groups). There are also concerns about data security, the digital divide (unequal access to benefits), and environmental impact (energy consumption of data centres). Regulations like GDPR aim to address some of these issues by giving individuals more control over their data.

Question 4

Do I need to know specific Big Data technologies for the AQA exam?

Accepted Answer

You don't need to memorise specific software versions, but you should understand the concepts behind technologies like Hadoop, MapReduce, and NoSQL databases. Be able to explain how they address the challenges of Big Data (e.g., distributed storage, parallel processing). Examples from real-world applications (e.g., Google's use of MapReduce) can strengthen your answers.

Question 5

What is the difference between data mining and Big Data?

Accepted Answer

Data mining is the process of discovering patterns and knowledge from large amounts of data, often using machine learning or statistical methods. Big Data refers to the datasets themselves and the challenges of storing, processing, and analysing them. Data mining can be applied to Big Data, but it can also be used on smaller datasets. Big Data provides the raw material for data mining, but the scale and complexity require specialised tools.

Question 6

How does Big Data relate to machine learning?

Accepted Answer

Machine learning algorithms often require large amounts of data to train accurate models. Big Data provides the volume and variety needed for tasks like image recognition, natural language processing, and recommendation systems. However, Big Data also introduces challenges like data cleaning, feature selection, and computational cost. In the AQA course, you should understand that machine learning is a key application of Big Data, but not the only one.

Big Data

Topic Overview

Key Concepts

What You Need to Demonstrate

Marking Points

Examiner Tips

Common Mistakes

Frequently Asked Questions

Before You Start

Likely Command Words

Ready to test yourself?

Related Topics in AQA A-Level Computer Science

Consequences of uses of computing

Fundamentals of algorithms

Fundamentals of communication and networking

Fundamentals of computer organisation and architecture

Topic Synopsis

Key Concepts & Core Principles

Exam Tips & Revision Strategies

Common Misconceptions & Mistakes to Avoid

Examiner Marking Points