Getting started with Titan using Cassandra and Solr

25.2.2016 | 4 minutes reading time

Titan comes with several possibilities to configure the storage (BerkleyDb, Cassandra, Hbase) and the underlying search engine (Lucene, Solr, Elastic). Since DataStax aquired Aurelius and DataStax Enterprise Search uses Solr, I wanted to setup an environment I can easily modify to use DSE later, instead of the Apache Cassandra version.

Pre Requirements

My Environment

I am running this setup on Ubuntu 14.04 in a Virtual Machine. I am using the latest Java version “1.8.0_73”.

Download cassandra 2.1.12 (titan currently supports version 2.x)
Download titan 1.0.0
Download solr-5.3.1

Please note: This article will only cover basic information on how to setup Cassandra or Solr. For more details I recommend starting reading Apache Cassandra Getting Started and solr Quickstart .

Cassandra

For this easy setup I will only use a one node cluster, so I leave the settings in cassandra.yaml as default.

To start Cassandra, unzip the downloaded Cassandra package and run the Cassandra binary inside of cassandra/bin

tar xvfz apache-cassandra-2.1.12-bin.tar.gz
cd apache-cassandra-2.1.12
bin/cassandra

Solr

Preparation

To start Solr, first unzip the downloaded Solr package.

tar xvfz solr-5.3.1.tgz

To be able to use geospacial search, we need to copy the file jts-1.13.jar – which is coming with Titan DB – into the Solr lib folder.

cp titan-1.0.0-hadoop1/lib/jts-1.13.jar solr-5.3.1/server/lib

This step is necessary, because the schema.xml – provided by Titan – uses geo definitions to be able to use spatial queries. If we don’t copy this jar into our classpath, we will run into the following error, when trying to create the Solr core.

https://gist.github.com/HashtagMarkus/32075e726e4990059c84

The second possibility, to get rid of this error, is to delete the lines in schema.xml where a “geo” jts property is used. Of course that way we are not able to use geospacial search like shown in the official examples .

Now we can start Solr

./solr-5.3.1/bin/solr start

To validate that Solr is running, point your browser to http://localhost:8983/solr/#/

Create Core

In general, we need to create a Solr core for each index we create in Titan. In the GraphOfTheGods examples, we want to run when this setup is done, two indexes are created: “vertices” and “edges”. The “vertices” index will be used to be able to do some range search on the “age” properties of our vertices. The “edges” index will be used to search for a property named “reason” on some of the edges as well as to be able to do a geo search.

Before we can create these Solr cores, we need to copy the predefined Solr configuration files into Solr’s configsets folder. These configuration files are included in our Titan package.

https://gist.github.com/HashtagMarkus/8ae4221f02a895984bca

Now we can create our cores:

To verify, that the cores were successfully created, open the Solr pannel inside your browser and see if both cores are present in the drop down list.

Starting Gremlin Shell and creating Titan sampledata

There are several ways to use Titan. For the purpose of this tutorial I run Groovy commands inside of the Gremlin shell, which is provided within the Titan package. The Gremlin shell comes with the necessary plugins to run all example commands.

In this example I run everything on a single machine. If you want to install Cassandra and Solr on separate machines, you need to make sure your servers are accessible from the outside. You’ll also need to edit the titan-cassandra-solr.properties file to point to the correct IP addresses for both – Cassandra and Solr.

vi titan-1.0.0-hadoop1/conf/titan-cassandra-solr.properties

Also make sure that the other listed properties are set accordingly. You could also use Solr cloud, but this setup would be quite different – I will not cover this setup in this post.

https://gist.github.com/HashtagMarkus/88cd82dcc48bffba8e73

Now that we finished setting up each of our components, its time to start the Gremlin console:

cd titan-1.0.0-hadoop1
bin/gremlin.sh

To test if our setup is correct we now load the Titan default graph named “GraphOfTheGods”.

https://gist.github.com/HashtagMarkus/2342de47694ffb036d81

To test if our setup is working, in the above example I first search for the vertex with the property “name = hercules”. Then I follow the edges pointing out to find the name of hercules parents. In the last example we do a geospacial search to find places within the given radius.

For a complete example of traversing this example graph, see the official Titan documentation

Conclusion

Setting up Titan as a highly scalable graph database using Cassandra as storage and Solr as search engine can be a bit tricky. The quick start examples provided by Aurelius – especially for using Cassandra with Solr – were not working for me out of the box. I hope this post helped to setup a first environment graph environment.

Was this post helpful?

Blog author

Markus Höfer

IT Security Consultant

Do you still have questions? Just send me a message.

fromMarkus Höfer

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 3 minutes reading time

Markus Höfer

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Using External Secrets with Crossplane & ArgoCD

Most Crossplane providers need to authenticate themself against Cloud infrastructure providers. But how do we store these Secrets in a GitOps fashion? If external secret stores are a great way of doing this: How do we successfully integrate them with...

Infrastructure as Code
Platform engineering
DevOps
Cloud native

30.9.2024 | 15 [Missing String "readingTime"]

Going full GitOps with Crossplane & ArgoCD

In the last post we already deployed Crossplane with ArgoCD in a GitOps-fashion. But what about Crossplane providers and their configuration? And can't we optimize the boostrapping with the ArgoCD App-of-Apps pattern? We can! And we'll also provision...

Cloud native
Platform engineering
DevOps
Infrastructure as Code

9.9.2024 | 13 [Missing String "readingTime"]

Bootstrapping Crossplane with ArgoCD

After going into detail about why the integration of Crossplane and ArgoCD is a great way to unlock a new level of GitOps, I promised to dive into the details of such a setup. Here we are! Let's have a look at the basic steps how to use Crossplane together...

Infrastructure as Code
Platform engineering
DevOps
Cloud native

2.9.2024 | 11 [Missing String "readingTime"]

From Classic CI/CD to GitOps with ArgoCD & Crossplane

Lately I found a passion in integrating Crossplane with ArgoCD and finally wanted to write about all the steps needed to create a full blown working setup of both. Just as I finished the code and tried to find a good start into the topic, I found that...

DevOps
Platform engineering
Cloud native
Infrastructure as Code

27.8.2024 | 8 [Missing String "readingTime"]

Spring Boot and HTMX: Deployment to AWS Lambda

This is the next part of my series about Spring Boot and HTMX. In this post, I will show you how to deploy the application created in the previous post to AWS Lambda. If you're in a hurry or impatient, you can simply check out the accompanying Git Repo...

Serverless
Spring
AWS
DevOps
Cloud

30.7.2024 | 5 [Missing String "readingTime"]

Create, build & publish Crossplane Configuration Packages with GitHub ...

You already created your first Crossplane Compositions? Pretty nice! But how to store them in Git? How to create and build a Configuration Package from it? And finally: how to publish and consume these Configurations in your Crossplane management cluster...

DevOps
Platform engineering
Cloud native
Infrastructure as Code

3.6.2024 | 14 [Missing String "readingTime"]

Testing Crossplane Compositions with kuttl, Part 2: Given, When, Assert

In the first part of this blog series we learned about kuttl and why it's a great idea to write tests for your Crossplane Compositions. Now it's time to set up the kuttl test steps to finally verify our Composition renders correctly. Crossplane – blog...

Infrastructure as Code
Cloud native
Platform engineering
DevOps

27.5.2024 | 16 [Missing String "readingTime"]

Testing Crossplane Compositions with kuttl, Part 1: Preparing the TestSuite

Does writing Kubernetes Manifests count as writing code? Should we still bother to test it? Sure! And with the Kubernetes Test Tool (kuttl) there's great tooling available. Let's explore how to use it with Crossplane. Crossplane – blog series 1. Tame...

Cloud native
Platform engineering
DevOps
Infrastructure as Code

21.5.2024 | 16 [Missing String "readingTime"]

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 [Missing String "readingTime"]

Dr. Florian Rademacher

An introduction to federated learning in an industrial context: Advanced

In the Machine Learning space, it was long believed that sharing learnings or weights was safe in the sense that the input data couldn't be extracted. However, this belief has been challenged by researchers coming out over the years. Nowadays, numerous...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 9 [Missing String "readingTime"]

An introduction to federated learning in an industrial context: Fundamentals

With the help of data, companies are able to make more informed decisions, optimize their workflows and gain an edge in the competitive world of business using the power of Machine Learning (ML). However, handling data has become increasingly difficult...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 8 [Missing String "readingTime"]

IoT fleet management: A comparison of balena and Portainer

When your system contains many IoT devices that are scattered over a large production facility or even distributed over multiple facilities, it is important that you can manage and update the deployed software, access logs and easily provision new devices...

IoT
IIoT
DevOps
Container
Raspberry Pi

10.1.2023 | 8 [Missing String "readingTime"]

Florian Lüdiger

Time to Renovate

How to keep your IT infrastructure up to date and reduce manual effort to a minimum by using Kubernetes, Helm, GitOps (FluxCD), Continuous Integration (GitLab-CI) and Renovate. When we moved into our house, everything was new and shiny. Well – it was...

DevOps
Infrastructure as Code

19.12.2022 | 8 [Missing String "readingTime"]

Introduction to GitOps with ArgoCD

In this post you will learn what GitOps is about and see the steps to create a setup on your laptop to gain some experience with ArgoCD. Using an industry standard container orchestrator such as Kubernetes, this enables developers to continuously deploy...

CI/CD
Kubernetes
GitHub
Open Source
DevOps
Container
Infrastructure as Code
Infrastructure
Spring

31.10.2022 | 10 [Missing String "readingTime"]

The state of APIOps and the deployment of API definitions

Having learned in one of my posts on Medium that API design is not really an easy task and involves a lot of work, also mentioned in my last post here on the blog, I'm going to move on to another complicated area of APIs, APIOps and, in more detail, ...

API
CI/CD
DevOps

12.10.2022 | 7 [Missing String "readingTime"]

Daniel Kocot

Heroku is dead: Let’s deploy Spring Boot containers on fly.io!

Heroku is cancelling their free plan! What about all my open-source projects? Luckily fly.io comes to the rescue! Here are the missing docs on how to run Spring Boot on fly.io.Why I love(d) HerokuHeroku was my go-to PaaS for open-source projects for ...

CI/CD
Java
Cloud
DevOps
Spring

18.9.2022 | 17 [Missing String "readingTime"]

Platform Engineering – A primer

Currently, platform engineering is a topic that is causing a lot of reactions in the vastness of the World Wide Web. Especially for customers from the enterprise environment, it leads to interesting side effects when DevOps teams suddenly turn into Platform...

Accelerate
CI/CD
DevOps

12.9.2022 | 5 [Missing String "readingTime"]

Daniel Kocot

Tame the multi-cloud beast with Crossplane: Let’s start with AWS S3

What if learning the Kubernetes API is all you need to provision any infrastructure? And we’re not only talking about AWS, Azure & Google – but also IONOS, DigitalOcean and even vSphere. Let’s have a look at Crossplane and how we can create an S3 Bucket...

AWS
CI/CD
Cloud
DevOps

3.7.2022 | 21 [Missing String "readingTime"]

Building an instant noodles DevOps starter pack with Terraform and AWS

How can we help a fictitious startup kickstart its software development process? Using Terraform and AWS services, we’ll build an IT infrastructure that is ready within minutes and ticks quite a few boxes on the technical DevOps capabilities list. Just...

Cloud
Infrastructure
AWS
CI/CD
DevOps

27.6.2022 | 21 [Missing String "readingTime"]

Development Containers & GitHub Codespaces kill the “works on my machine...

We love them, and hate them at the same time: local development environments. But what if we could use remote development techniques like Development Containers or GitHub Codespaces to finally overcome the “works on my machine” problem? And also end ...

DevOps
CI/CD
Cloud
Container

12.6.2022 | 15 [Missing String "readingTime"]

Getting started with Titan using Cassandra and Solr

Pre Requirements

My Environment

Cassandra

Solr

Preparation

Create Core

Starting Gremlin Shell and creating Titan sampledata

Conclusion

Was this post helpful?

Blog author

More articles

Dangling DNS in cloud infrastructures

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Using External Secrets with Crossplane & ArgoCD

Going full GitOps with Crossplane & ArgoCD

Bootstrapping Crossplane with ArgoCD

From Classic CI/CD to GitOps with ArgoCD & Crossplane

Spring Boot and HTMX: Deployment to AWS Lambda

Create, build & publish Crossplane Configuration Packages with GitHub ...

Testing Crossplane Compositions with kuttl, Part 2: Given, When, Assert

Testing Crossplane Compositions with kuttl, Part 1: Preparing the TestSuite

Becoming a Data-Driven Company with Applied Data Products

An introduction to federated learning in an industrial context: Advanced

An introduction to federated learning in an industrial context: Fundamentals

IoT fleet management: A comparison of balena and Portainer

Time to Renovate

Introduction to GitOps with ArgoCD

The state of APIOps and the deployment of API definitions

Heroku is dead: Let’s deploy Spring Boot containers on fly.io!

Platform Engineering – A primer

Tame the multi-cloud beast with Crossplane: Let’s start with AWS S3

Building an instant noodles DevOps starter pack with Terraform and AWS

Development Containers & GitHub Codespaces kill the “works on my machine...