Big Data - Lifeline of the Modern World
We are in the midst of a major shift in the history of human civilization where we are redefining how we live our life and how we interact with each other. We are growing up in a dynamic technology driven environment where production and consumption of data is at the core of most activities. Everyday we unknowingly make use of data in variety of ways from finding trending news, to watching popular youtube videos, to looking for traffic conditions. However, I think we have only scratched the surface so far and in the future we will make a normal day look more like a scene from a sci-fi movie. Hopefully, humanity will still exist if robots control the future, but that is a discussion for another day. In this paper I want to understand the various forms of data generated in today’s world and how we make use of it in different ways. I want to further explore ideas about how big data could impact future lifestyle. Next, I want to take a look at some tools that are available to collect, analyze, and visualize data. My final goal is to then use a volume of data collected through social media and find some interesting trends by using big data analysis tools. Role big data plays in today’s world It is very intriguing to me how we unknowingly make use of different forms of data in our everyday life without pondering over how it was generated, processed, and resulted in the form we used it. For example, everyday when we check for weather outlook in the morning we don’t realize how many weathers sensors or satellite images collected the data that was processed through different weather models in order to predict a warm sunny day for us. It is almost magical. The most common use of data explosion happens in social media interactions. Data posted to social media is analyzed and used to find likes, dislikes, and other aspects of our lives. You can imagine our messages, pictures, and videos we post in social media as small pieces of a puzzle ,that when put together with other similar pieces, can paint a picture of society’s behavioral patterns. For example, you could analyze all social media posts of people within the age group of 20-30 and observe the songs they like or a presidential candidate they prefer. There are endless possibilities for the ways that the jigsaw puzzle could fit together to answer the questions that a business or community organization may be looking for. How Big Data can make an Impact in the Future Big data is already making a big impact on the world today. However, in the future it could change every single part of our lives. We are already at the cusp of an era with driverless cars but I can visualize the future where cars become ‘smarter’ and collaborate with each other to improve our driving experience even more. ‘Smart’ cars of future could constantly collaborate data about traffic and road conditions (potholes, slippery nature etc.) to find the fastest and most comfortable route to the destination. In addition, it could check with your calendar to find where you need to go next, maybe even get the coffee ready for you just the way you like it, and inform you of the news that you want to hear. In addition of being super efficient, future cars would eliminate the chance of human error creating safer roads for everyone. On top of saving lives on the road, big data could help save lives in the hospital and the home. Patients with conditions like diabetes and high blood pressure often have trouble controlling their condition properly. Future wearable devices could measure and send electrical signals from the body to your care providers where medical experts can analyze and prevent complications such as a heart stroke before they happen. It will save many more lives. Tools to Store, Analyze, and Visualize Data We are generating huge amounts of data from a variety of sources everyday and it is stored and analyzed to answers different questions. The pace at which data is being generated makes it hard to store it in traditional data storage solutions like files or databases that are limited by disk storage. A different set of technological solutions are needed that can allow data to be distributed across many machines. Some of the popular technology and tools that allow data to be stored in a distributed environment are NoSQL databases like Elasticsearch, MongoDB and distributed file systems such as Hadoop. One interesting thing I found about Elasticsearch is that it can use many different sources for input data including twitter feeds. Once data is stored across many machines we need ways to analyze it across machines in a distributed manner. There are many ways to do that including writing custom software as well as using capabilities that exists in data storage management systems. Elasticsearch provides many of these capabilities to look for data summaries and patterns.You can use regular query functions to find documents with certain keywords. However, what makes it more powerful is the ability for it to aggregate results of certain queries. For example, we can very easily find total number of tweets that have keywords ‘common core’ and ‘good’ or one that has keywords ‘common core’ and ‘bad’. With this we could measure a favorable and unfavorable reaction to common core. Elasticsearch also the aggregation of data based on geographic locations as well. This means being able to find where in the country that people find common core good or bad. Finally, after all the data is stored and analyzed it is also important to be able to present data in a format that is easy to understand and visualize. The use of charts, histograms, and graphs are some common ways to represent analysis results. There are many tools available to do that including Microsoft office. However, there are also tools available to represent the aggregated results on map. For example, you could represent the number of high school dropouts in a state by assigning different shades of colors to states on a map with a darker color representing most number of dropouts. ArcGIS online is a program that provides many features that help to easily understand visualization of big data analysis results. My Big Data Project I am not ashamed to admit that I am a big NFL fan so it didn’t take me too long to identify my big data project. I wanted to profile NFL teams fans active on twitter - where do they live, what is their distribution by gender and age, which team has most number of active online fan. I plan to use logstash plugin for elasticsearch to retrieve all tweets that mentions NFL teams. I then plan to use elasticsearch aggregation query to summarize result for various parameters. Finally, I hope to visualize the output using ArcGIS online that is available free to high school students.