Deploying an ISOBlue server


#1

Hey @aultac and @wang701,

in light of @auke and @roel’s recent advances with their ISOBlue we’d like to start exploring the possibility of standing up an ISOBlue -> OADA infrastructure.

The first step is deploying an ISOBlue server: where do we start? :blush:

I scanned the repo’s in https://github.com/ISOBlue but am not sure which one contains the server that @auke and @roel are currently pushing their data to.


#2

@wang701 can respond on details of what repos on Github have which code. In general, there are 2 things that the current “server” does, neither of which really requires much code. The first thing is to host a running instance of Kafka (the “database” to which the messages are sent from the ISOBlue when a connection is present). The second thing is that the ISOBlue has a hard-coded address (that you can change to your own) to which it opens a reverse proxy over ssh whenever it is on and has a network connection. This makes it so that regardless of what network the isoblue is connected to (cell, wifi, etc.), you can ssh into it to get a shell for debugging, monitoring, running things, etc. @wang701 can chime in here too if he’s got it running anything else.

In a future incarnation, we are hopeful that we can replace the kafka installation with our OADA service, thus defining a standard REST API for getting telematics data to the cloud. To do that, we need to write our own “sync” for messages to replace the simplistic kafka-mirrormaker on the ISOBlue, and test/optimize the current OADA implementation to handle the high bandwidth data stream. We’re working on it! :slight_smile:

Thanks!
Aaron


#3

Cool stuff! Thanks for the quick reply.

You’ve got some cool stuff in the works!

Looking forward to receiving further instructions. Over. :stuck_out_tongue:


#4

Heya @aultac and @wang701,

I managed to put https://www.isoblue.org/docs/data.html in a Docker image, build a container and run the Kafka broker, whoop!

I’d like to consume some messages now but the Public Datasets are broken: they all say that they are not implemented yet, see e.g. http://cloudradio39.ecn.purdue.edu/data/07182017_ib1_krogmeier.tar.gz Can you have a look?

How should we go about logging data from @Auke and @Roel’s ISOBlue to this Kafka broker once we deploy it to a server? Which ports should be exposed, for instance?

PS I’ll share the Dockerfile once it’s finished.


#5

Hello @Simeon!

I managed to put https://www.isoblue.org/docs/data.html in a Docker image, build a container and run the Kafka broker, whoop!

That is great!

I’d like to consume some messages now but the Public Datasets are broken: they all say that they are not implemented yet, see e.g. http://cloudradio39.ecn.purdue.edu/data/07182017_ib1_krogmeier.tar.gz Can you have a look?

Yes, the links on the ISOBlue web site are down. I will fix that. We changed the cloudradio39 to host the MySQL app. That’s why all the links are down. All of our kafka logs are stored over here now:

https://purdue0-my.sharepoint.com/:f:/g/personal/wang701_purdue_edu/EpOQ9StOgtJOkF3i8_7DlC4BPdYcLLnWhUhrfNku3j4QRw?e=GHrr2r

Please have a look. Our onedrive has a 15GB file size limit. If you see a kafka log that has multiple tars, make sure to merge them into one and then extract the tar.

How should we go about logging data from @Auke and @Roel’s ISOBlue to this Kafka broker once we deploy it to a server?

Most of the instructions for setting up an ISOBlue to mirror its selected Kafka topics to a remote Kafka broker are specified here:

Which ports should be exposed, for instance?

By default, we follow this scheme for the Kafka mirroring process from an ISOBlue to a remote server:

ISOBlue_broker_port -> server_broker_port (default: 9092)
ISOBlue_zookeeper_port -> server_zookeeper_port (default: 2181)

For the ISOBlue broker port and the zookeeper port, you can pick whatever ports you want. Just avoid using the well-known ones (https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers).

On our ISOBlues, we use the following port configurations:


#6

Hi @wang701, a short update: I managed to download some demo data, “load” it in Kafka and consume some messages. I also pushed the whole thing to a Digital Ocean droplet and am ready to configure and test the ISOBlue -> Cloud sync.

There was a small issue, though, that ate some of my time and effort: Kafka’s default log retention period is 7 days which means that if you load one of the demo dataset sets - which are older than 7 days - it will proceed to purge the logs thereby throwing away the messages.

If you are unaware of this behaviour (and Kafka in general) as I was it looks like the consumer is broken. The fix is to increase log.retention.hours in Kafka’s server.properties config file prior to starting the broker and loading the logs.


#7

Hello Simeon,

a short update: I managed to download some demo data, “load” it in Kafka and consume some messages. I also pushed the whole thing to a Digital Ocean droplet and am ready to configure and test the ISOBlue -> Cloud sync.`

That is great!

There was a small issue, though, that ate some of my time and effort: Kafka’s default log retention period is 7 days which means that if you load one of the demo dataset sets - which are older than 7 days - it will proceed to purge the logs thereby throwing away the messages.

This is the intended behavior for Kafka. For normal clusters, people usually set up Kafka as the main message bus to guarantee order but they don’t normally store the data inside Kafka forever. They usually setup microservices that consume messages from Kafka and store them into a database of their choice. Sometimes, they setup some microservices to filter or process messages consumed directly from Kafka and store the processed data into a dedicated database as well.

If you are unaware of this behaviour (and Kafka in general) as I was it looks like the consumer is broken. The fix is to increase log.retention.hours in Kafka’s server.properties config file prior to starting the broker and loading the logs.

If you want to keep the topics and their logs forever (I don’t recommend doing so on your cluster). You can set the log.retention.hours=-1 as seen here.


#8

Good to know! Maybe it’s wise to add a note about the retention period in relation to the demo data in https://www.isoblue.org/docs/data.html#setup-a-zookeeper-and-a-kafka-broker to prevent others from scratching their head in the face of disappearing messages.

I set the retention period to 6 months to cover the demo dataset I downloaded.