“Big Data” – another stressed marketing buzzword….

For me the term Big Data means multiple things:

  • Lot’s of data. The volume defines the scenario because it has impact on the technology. It is too much data for a single database, for a single server.
  • Computer Clusters & parallel processing. In order to cope with these amounts of data, it needs to be processed in parallel across multiple reasonably cheap servers.
  • Never used data. While a Data Warehouse tries to find information in database stored data via nice visualizations, in the Big Data project more automation is needed. Statistical methods, machine learning and visualizations containing large amounts of data. Everything to find the root cause of a pattern in the data in the hopes that this can be influenced for a business benefit.
  • Non-linear, non-SQL. A “sum of revenue per region” is a typical Data Warehouse query. The data used in Big Data projects is not suited for such simple queries, it needs way more complex processing. Example: In a weblog find when the user did open a page, when the page did load more content and when the user did open another page. The time of these events provides clues to the amount of time the reader spent on each page.
  • New business models. A car insurance asking for the drivers age and the annual driving distance is limited in the pricing models. But an insurance taking into consideration the driving style based on detailed logging of each ride allows a customer centric pricing model. This has lots of side effects apart from the technologies used. Data privacy (GDPR), fraud, user acceptance,… all to be considered.