codecentric mittendrin

Meetups, Stammtische, Hackathons, User Groups: Die codecentric ist weit mehr als die Summe ihrer Mitarbeiter und Projekte.

Hinter jeder erfolgreichen Software steht eine starke Community

Wissensvermittlung, Nachwuchsförderung und der fachliche Austausch auf Augenhöhe sind für uns Herzensangelegenheiten. Wir sind der Meinung: Geteilte Innovationsfreude ist doppelte Innovationsfreude.

Deshalb mischen sich codecentric-Mitarbeiter und -Mitarbeiterinnen gerne unter die Community – ob als Gastgeber, Redner oder Organisatoren diverser Veranstaltungen. Treffen Sie uns auf einem der folgenden Events!

Big Data Meetup Karlsruhe

Big Data Meetup Karlsruhe

codecentric AG, Gartenstraße, Karlsruhe, Deutschland 15.01.2018 | 19:00

First Meeting in Karlsruhe

Wir freuen uns, euch zur ersten Karlsruher Ausgabe der neuen Big-Data-Gruppe einzuladen. Für den ersten Talk begrüßen wir Robin Moffat von Confluent. Der zweite Talk kommt von Dominik Benz von der Inovex GmbH.

Look Ma, no Code! Building Streaming Data Pipelines with Apache Kafka
Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again!

Companies new and old are all recognizing the importance of a low-latency, scalable, fault-tolerant data backbone, in the form of the Apache Kafka streaming platform. With Kafka, developers can integrate multiple sources and systems, which enables low latency analytics, event driven architectures and the population of multiple downstream systems. These data pipelines can be built using configuration alone.

In this talk, we’ll see how easy it is to stream data from a database such as Oracle into Kafka using the Kafka Connect API. In addition, we’ll use KSQL to filter, aggregate and join it to other data, and then stream this from Kafka out into multiple targets such as Elasticsearch and MySQL. All of this can be accomplished without a single line of code!

Why should Java geeks have all the fun?

Flow is in the Air: Best Practices of Building Analytical Data Pipelines with Apache Airflow
Apache Airflow is an Open-Source python project which facilitates an intuitive programmatic definition of analytical data pipelines. Based on 2+ years of productive experience, we summarize its core concepts, detail on lessons learned and set it in context with the Big Data Analytics Ecosystem.

Creating, orchestrating and running multiple data processing or analysis steps may cover a substantial portion of a Data Engineer and Data Scientist business. A widely adopted notion for this process is a „data pipeline“ – which consists mainly of a set of „operators“ which perform a particular action on data, with the possibility to specify dependencies among those. Real-Life examples may include:
Importing several files with different formats into a Hadoop platform, perform data cleansing, and training a machine learning model on the result perform feature extraction on a given dataset, apply an existing deep learning model to it, and write the results in the backend of a microservice.
Apache Airflow is an open-source Python project developed by AirBnB which facilitates the programmatic definition of such pipelines. Features which differentiate Airflow from similar projects like Apache Oozie, Luigi or Azkaban include (i) its pluggable architecture with several extension points (ii) the programmatic approach of „workflow is code“ and (iii) its tight relationship with the the Python as well as the Big Data Analytics Ecosystem. Based on several years of productive usage, we briefly summarize the core concepts of Airflow, and detail in-depth on lessons learned and best practices from our experience. These include hints for getting efficient quickly with Airflow, approaches to structure workflows, integrating it in an enterprise landscape, writing plugins and extentions, and maintaining it in productive environment. We conclude with a comparison with other analytical workflow engines and summarize why we have chosen Airflow. We will put another special focus on the context of hybrid (realtime + batch) analytical platforms, and how Airflow can complement the Apache Kafka / Confluent ecosystem.

codecentric stellt Location, Getränke und Pizza.


Florian Troßbach