From Raw Data To Usable Results, Build, Rinse and Repeat
In this talk, the attention is on the data, from its generation, through its collection, storage, tagging, metadata collection and normalization to processing and reprocessing, and finally archiving, indexing, searching, and exchange. Some of the topics will be the sources of data, especially in life sciences, the speed of transfer, delivery validation mechanisms, local caches, data integrity, rule engines, local vs cloud vs hybrid, storage layers, file systems, workflows, performance, and the software that ideally ties it all together instead of adding complexity.
The talk of Big Data is now reaching the same level of saturation as Cloud did in 2010. Similarly, it means different things to different people. As consultants, we are exposed to various environments and the knowledge travels both ways. We teach our clients what we learned during various projects and we learn from the clients about the specifics of what they are setting out to accomplish. Being the true melting pot of big data knowledge, with hundreds of available technologies that are impossible to evaluate for a single person or even a single company, we are continuously refining and rethinking the approaches with the goal of using the best available tools for the job. As a result, we always have deep insights into the functionality, usefulness, and usability of the various tools, not just as casual users but scrutinizing the technology and integration aspects, as well.