Big data, much like deep tech, is a nebulous term without a concrete definition. It refers to digital information with a sizable volume and great value while encompassing many different forms. Common forms of big data are social media, texts, video-sharing platforms, etc.
Image Credit: Yurchanka Siarhei/Shutterstock.com
How Large is Big Data, and How Do We Store It?
On average, 440 exabytes of data are generated every month by a singular smartphone user. Every minute, millions of google searches, snaps (via Snapchat), and YouTube video watches are performed, spawning a staggering degree of data that is difficult for traditional machines to compute. To accomplish this, software such as Hadoop, Cassandra, and Frameworks are needed to catalog and categorize.
They do this by accruing large sums of data, breaking it down into much smaller segments, and finally distributing this data across various machines. Copies of this data are generated so that nothing is lost and spread evenly across varying nodes. This safeguarding is pivotal so that if one node fails, a backup will still exist.
The categorization and filing of this data are not accomplished through one machine, but many, performing these tasks in parallel to improve speed and delivery. People refer to this as parallel processing.
Analysis of Big Data
Entire industries are centered around "big data analysis", a measure of how the user reacts to said data. Designers evaluate this material to improve user experience, expounding upon how often users interact with a given platform, when they pause or restart at a certain point, and when they stop using said data. This insight has helped with web design, game design, software accessibility, etc.
Big data analysis has become so prevalent and novel that modifications and amendments to the entire scientific method are being done. Machine learning is to blame for this. Traditionally researchers create a hypothesis and accrue data to confirm it. In stark contrast, machine-learning algorithms can instead comb through enormous data sets to generate hypotheses. This is also called "brute force classification", where machines simply look for associations amongst differing elements in a given data set.
Applications of Big Data
Big data has not only provided the common user with entertainment and insight, but it has also provided aid in emergencies and saved lives. Open-source frameworks such as Hadoop or Spark allowed emergency specialists to accrue a surfeit of computer models and intel regarding the migration patterns and wind conditions following Hurricane Sandy. Other east coast calamities such as Hurricane Elsa and Hurricane Florence have been ameliorated from this intel, with predictions of their outcomes occurring days prior. This speaks to the nature of big data, especially regarding its value and veracity.
Another example of health-related advancements founded upon the bedrock of big data can be seen in The National Institute of Health (NIH). Years ago, a new initiative was introduced called the Big Data to Knowledge (BD2K) program, which endeavors to provide shared computational information to other health-related industries and large-scale biomedical firms. This information can take the form of data standards, data catalogs, ontologies, etc.
Conglomerates and organizations worldwide are beginning to recognize how imperative the "Big Data Movement" is becoming. The UN, the WHO, and government agencies around the globe are making choices and pursuing agendas based on big data.
Image Credit: MIKHAIL GRACHIKOV/Shutterstock.com
The Interrelationship between the Quantity and Quality of Big Data
The quality of "Big Data" that organizations look for is considered. It is said that professional researchers procuring answers to specified questions takes the form of high-quality data. In contrast, though hundreds of thousands of exabytes of data are produced by smartphone users every month, the information exhibited on social networking platforms is of much lower quality. However, what is interesting is that the enormous volume of this data can sometimes compensate for its lack of exactness.
A prime example of this can be genetic data, millions of nitrogenous bases spread throughout one gene sequence. Genomic analysis and Sanger sequencing provided by companies such as "23 and Me" or "Ancestry" is considered "high-quality data" because of the concise and specific nature of the methodologies employed. However, the sheer volume of genetic information provided by standard-compliant electronic health record (EHR) systems can trump the value of the former.
Although this EHR "big data" method does not employ the use of high-tech apparatus such as PCR machines or fluorescence-based sequencing systems, the sample size is close to 100% of the population. The predictive power is greater than that of the former method, which targets less than 1% of the population.
The usefulness and function of Big Data, as well as its analytical properties, are clearly apparent. However, challenges are approaching regarding its nature and the sheer volume it encompasses. As data sets grow exponentially, they become more and more challenging to store and handle. Though "Moore's Law" (the number of transistors in integrated circuits (IC) doubling every two years) has proven both true and useful in tackling this problem, there may come a time where even this law peters out.
Sources:
- Sætra H. S. (2018). Science as a Vocation in the Era of Big Data: the Philosophy of Science behind Big Data and humanity's Continued Part in Science. Integrative psychological & behavioral science, 52(4), 508–522.
- Caliebe, A., Leverkus, F., Antes, G., & Krawczak, M. (2019). Does big data require a methodological change in medical research?. BMC medical research methodology, 19(1), 125.
- Mallappallil, M., Sabu, J., Gruessner, A., & Salifu, M. (2020). A review of big data and medical research. SAGE open medicine, 8, 2050312120934839.
- Kevin Maik Jablonka, Daniele Ongari, Seyed Mohamad Moosavi, and Berend Smit (2020) Big-Data Science in Porous Materials: Materials Genomics and Machine Learning Chemical Reviews 120 (16), 8066-8129
- Martin-Sanchez F, Verspoor K. (2014) Big data in medicine is driving big changes. Yearb Med Inform. 2014 Aug 15;9(1):14-20.
- Cumming, D. R., Furber, S. B., & Paul, D. J. (2014). Beyond Moore's law. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences, 372(2012), 20130376.
Further Reading