Apache Kafka

Apache Kafka is the realtime backbone for data in many companies already.

In the past I learned data can be processed in parallel or with transactional guarantees. Kafka showed me that there often is a pragmatic middle way. The idea is, does it really matter if patient A or patient B’s data is processed first? In many cases no, as long as all data belonging to a single patient is processed in the correct order.

Or in Kafka terms, the data is partitioned by the patient ID and thus all data within a partition is processed in order and transactional, parallel to other patients’ data.

That got me hooked on Apache Kafka. Its other concepts in regards to Load Balancing, KStreams, etc. are clever as well.

SAP with Big Data

There are many options to combine Big Data with SAP including cheap ones.

SAP follows the concept of openness, hence there are many options to integrate Big Data with SAP.

The downside of this freedom of choice is choosing the right approach. Which products to use, involved technologies, the business needs, how to involve users.


  • An open source minded person might do all the transformations in e.g. Apache Spark and load the results into Hana via SQL DataSets.
  • A Hana team might connect to Hadoop using the Spark Connector.
  • A SAP person might suggest using SAP Data Hub for everything.

All approaches have pros and cons. Navigating through this minefield needs lot of background knowledge and experience.

SAP Data Integration

Pick and use the the best suited SAP product for data Integration.

SAP provides various products in the area of Data Integration. Some focus on pure Data Integration, others on Process or System Integration. The trick is to pick the one best suited for the given task.

(e.g. see here)

Of course it would be nice if one tool could solve all problems but the requirements are too diverse for that. In fact, one of the reasons there are so many options is because SAP tries to help customers and provide tools that perfectly match a typical use case. However, as soon as the use case does not match the products’ assumptions… things get hard.

On the other hand, with general purpose tools all things are made harder than they need to be. You need to find the one that fulfills your needs best.

If in doubt, we can brainstorm together. It will be much cheaper than what I have seen recently, products bought for >100k and not providing any value.


SAP Hana Smart Data Integration is one of the hidden gems, useful for many situations.

The vision of Hana SDI was to designu00a0 transformations once and let the user decide the qualities at activation time. He could choose:

  • Federation: Data is current as the source system gets queried directly but the users stress the source system with their queries unintentionally. The queries return with source-system speed, if that.
  • Batch ETL: Move the data at the highest possible speed into the Hana instance, hence queries will run at Hana speed. But data might is not current and not transactional consistent at any time.
  • Realtime Push: The source system is asked to push changes to Hana and these changes are incorporated into the Hana target tables. Hence all queries will run at Hana speed and will return current data. Depending on the transformation, this can be a lot of work to accomplish.

By using Hana Smart Data Access as the foundation and extending it, within a year (Nov 2014 with Hana 1.0 SPS9) the first version of Hana SDI was released. It allowed to add own Adapters and to use realtime push with various sources.

The other concepts of transitioning between the three integration styles was started but never completed. Only now this concept starts to get traction again by other Hana teams.

So if there are any questions in regards to Hana SDI, your chance to talk to the very creator of this product. I’m who you want to consult for this.

SAP Data Services

SAP Data Services is a very powerful ETL tool, per Gartner Magic Quadrant. We can help to be most efficient.

SAP Data Services is a very powerful ETL tool, as can be seen in the Gartner Magic Quadrant for Data Integration.

I was part of the product team in various roles, Developer, Product Manager, Performance Expert, Troubleshooter, Consultant, Trainer. Hence I can provide deep insights and build dataflows very quickly.

A suggested engagement is to get the customer team started, review the work on a frequent basis and build the most complex transformations together. This way the project team can learn while doing their jobs and will be most efficient.

During the days as product manager one of the tasks was to help the community. In the Business Objects Forum BOB answering questions. The majority of the in the SCN Wiki was created by me.