Introduction to Big DATA.
With the fruition of online services through the extensive use of the Internet, the habits taken up by businesses, stock markets, economies, and different organizations of governments. This has eventually changed the way people live and use technology. With the increase in all of these, there is a parallel increase in information flows and data collection that arises daily, which is more than ever.
Such outbreaks of data are relatively new. This is because now, each user and organization can store the information in digital form. So, to handle these exponential increases in data, there should be some mechanism and approach. Big Data is one way to handle such. In this lesson, you will learn about what is Big Data. Its importance and its contribution to large-scale data handling.
What is Big Data?
Big data can be defined as a concept used to describe a large volume of data, which are both structured and unstructured, and that gets increased day by day by any system or business. However, it is not the quantity of data, which is essential. The important part is what any firm or organization can do with the data matters a lot. Analysis can be performed on big data for insight and predictions, which can lead to a better decision and reliable strategy in business moves.
Types of Big Data
The data generated in bulk amounts with high velocity can be categorized as:
Structured Data: These are relational data.
Semi-structured Data: example: XML, JSON data.
Unstructured Data: Data of different formats: document files, multimedia files, images, backup files, etc.
8V’s of Big Data
1. Volume:
When we talk about Big data, probably volume is the very first criterion for consideration. The range of volume justifies whether it should be considered as ‘big’ or not. Usually, if the volume of data is above gigabytes, it is only considered big data from a volume perspective. What does measurement signify here? It could be petabytes, terabytes, or exabytes. This volume amount is considered based on data surveys of different organizations.
2. Velocity:
Stream analytics is a popular term today where high-speed data is processed using tools. But do you know stream analytics associated with which characteristics of big data? No doubt, it is the velocity of data. Here velocity means data generation speed, how frequently it is delivered and analyzed.
Now, the amount of data generated in today’s scenario is massive. Most importantly, it needs real-time processing for analysis purposes. For example, Google alone generates more than 40k search queries per second. Hence, we can imagine how fast processing is required to get insights from data.
3. Variety:
Big data deals with any data format – structured, unstructured, semi-structured, or even very complex. So, storing and processing unformatted data through RDBMS is not easy. However, such unstructured data provides more valuable insights into the information we rarely get from structured data. Besides, a variety of data means different data sources. So, this characteristic of big data also provides information on the data sources.
4. Veracity:
Not that all data that come for processing are valuable. So, unless the data is cleansed correctly, it is not wise to store or process complete data. Especially when the volume is such massive, there comes this dimension of big data – veracity. These particular characteristics also help determine whether the data is coming from a reliable source or the right fit for the analytic model.
5. Variability:
In Big data analysis, data inconsistency is a common scenario that arises as the data is sourced from different sources. Besides, it contains different data types. Hence, to get meaningful data from that enormous amount of data, anomaly and outlier detection are essential. So, variability is considered as one of the characteristics of big data.
6. Value:
The primary interest for big data is probably for its business value. Perhaps this is the most crucial characteristic of big data. Because unless you get any business insights out of it, there is no meaning of other big data characteristics.
7. Visualization:
Big data processing is not the only means of getting a meaningful result out of it. Unless it is represented or visualizes in a meaningful way, there is no point in analyzing it. Hence, big data must be visualized with appropriate tools that serve different parameters to help data scientists or analysts understand it better.
However, plotting billions of data points is not an easy task. Furthermore, it associates different techniques like using treemaps, network diagrams, cone trees, etc.
8. Validity:
Validity has some similarities with veracity. As the meaning of the word suggests, the validity of big data means how correct the data is for its purpose. Interestingly a considerable portion of big data remains un-useful, which is considered as ‘dark data.' The remaining part of collected unstructured data is cleansed first for analysis.
Challenges with BIG DATA
Rapid Data Growth: The growth velocity at such a high rate creates a problem in looking for insights using it. There is no 100% efficient way to filter out relevant data.
Storage: The generation of such a massive amount of data needs space for storage, and organizations face challenges in handling such extensive data without suitable tools and technologies
Unreliable Data: It cannot be guaranteed that the big data collected and analyzed are totally (100%) accurate. Redundant data, contradicting data, or incomplete data are challenges that remain within it.
Data Security: Firms and organizations storing such massive data (of users) can be a target of cybercriminals, and there is a risk of data getting stolen. Hence, encrypting such colossal data is also a challenge for firms and organizations.
Applications of Big Data
In today’s world, there are a lot of data. Big companies utilize those data for their business growth. By analyzing this data, useful decisions can be made in various cases as discussed below:
1. Tracking Customer Spending Habit, Shopping Behavior: In big retail stores (like Amazon, Walmart, Big Bazar, etc.) management team has to keep data on customer’s spending habits (in which products customers spend, in which brands they wish to spend, how frequently they spent), shopping behavior, customer’s most liked product (so that they can keep those products in the store). Which product is being searched/sold most, based on that data, the production/collection rate of that product gets fixed.
The banking sector uses its customer’s spending behavior-related data so that they can provide an offer to a particular customer to buy his particular liked product by using the bank’s credit or debit card with a discount or cashback. In this way, they can send the right offer to the right person at the right time.
2. Recommendation: By tracking customer spending habits, and shopping behavior, Big retail stores provide a recommendation to the customer. E-commerce sites like Amazon, Walmart, and Flipkart make product recommendations. They track what product a customer is searching for, and based on that data they recommend that type of product to that customer.
As an example, suppose any customer searched bed cover on Amazon. So, Amazon got data that customer may be interested to buy bed cover. Next time when that customer will go to any google page, advertisement of various bed covers will be seen. Thus, advertisement of the right product to the right customer can be sent.
YouTube also shows recommend video based on user’s previous liked, watched video type. Based on the content of a video, the user is watching, relevant advertisement is shown during video running. As an example suppose someone watching a tutorial video of Big data, then advertisement of some other big data course will be shown during that video.
3. Smart Traffic System: Data about the condition of the traffic of different road, collected through camera kept beside the road, at entry and exit point of the city, GPS device placed in the vehicle (Ola, Uber cab, etc.). All such data are analyzed and jam-free or less jam way, less time taking ways are recommended. Such a way smart traffic system can be built in the city by Big data analysis. One more profit is fuel consumption can be reduced.
4. Secure Air Traffic System: At various places of flight (like propeller etc) sensors present. These sensors capture data like the speed of flight, moisture, temperature, other environmental condition. Based on such data analysis, an environmental parameter within flight are set up and varied.
By analyzing flight’s machine-generated data, it can be estimated how long the machine can operate flawlessly when it to be replaced/repaired.
5. Auto Driving Car: Big data analysis helps drive a car without human interpretation. In the various spot of car camera, a sensor placed, that gather data like the size of the surrounding car, obstacle, distance from those, etc. These data are being analyzed, then various calculation like how many angles to rotate, what should be speed, when to stop, etc carried out. These calculations help to take action automatically.
6. Virtual Personal Assistant Tool: Big data analysis helps virtual personal assistant tool (like Siri in Apple Device, Cortana in Windows, Google Assistant in Android) to provide the answer of the various question asked by users. This tool tracks the location of the user, their local time, season, other data related to question asked, etc. Analyzing all such data, it provides an answer.
As an example, suppose one user asks “Do I need to take Umbrella?”, the tool collects data like location of the user, season and weather condition at that location, then analyze these data to conclude if there is a chance of raining, then provide the answer.
7. IoT:
Manufacturing company install IOT sensor into machines to collect operational data. Analyzing such data, it can be predicted how long machine will work without any problem when it requires repairing so that company can take action before the situation when machine facing a lot of issues or gets totally down. Thus, the cost to replace the whole machine can be saved.
In the Healthcare field, Big data is providing a significant contribution. Using big data tool, data regarding patient experience is collected and is used by doctors to give better treatment. IoT device can sense a symptom of probable coming disease in the human body and prevent it from giving advance treatment. IoT Sensor placed near-patient, new-born baby constantly keeps track of various health condition like heart bit rate, blood presser, etc. Whenever any parameter crosses the safe limit, an alarm sent to a doctor, so that they can take step remotely very soon.
8. Education Sector: Online educational course conducting organizations utilize big data to search for candidates, interested in that course. If someone searches for a YouTube tutorial video on a subject, then online or offline course provider organization on that subject sends an ad online to that person about their course.
9. Energy Sector: Smart electric meter reads consumed power every 15 minutes and sends this read data to the server, where data analyzed and it can be estimated what is the time in a day when the power load is less throughout the city. This system manufacturing unit or housekeeper suggests the time when they should drive their heavy machine in the night time when power loads less to enjoy less electricity bill.
10. Media and Entertainment Sector: Media and entertainment service-providing companies like Netflix, Amazon Prime, and Spotify analyze data collected from their users. Data like what type of video, and music users are watching, listening to most, how long users are spending on site, etc are collected and analyzed to set the next business strategy.