VoluntaryTracking.Enabled = true;

If you’ve been using Google Maps on your phone, you probably know there is a setting within that allows Google apps to use your device’s location any time it is on, or maybe not.

googlelocsettings

 

Or if you prefer the conspiracy theory version, you can read this.

The way I think of this is simply in a geeky way which is the title of this post. Since I had enabled voluntary tracking of my whereabouts, why not make good use to mash it up all in my Project GetFitY’all.

To get your location history in a timeseries of lat/long data points, go to the Google Maps Location history page. You have to sign in using your credentials first of course, so you are only seeing your own location history.

viewlochist

  1. Select the date/time range from the drop down list.
  2. To have a comprehensive list of location data points, click the link that says “Show All Points”.
  3. Click “Export to KML”. You would have downloaded an XML file in the Keyhole Markup Language (KML) format.

To make sense of these location history data points, the best means is to import this in PowerQuery. The steps were described in my previous posts about mashing the data in PowerQuery. The only exception is that I would open an XML file and use the KML file I downloaded. It’s supposed to be a simple step of expanding the 2 columns in PowerQuery which represent the <when> and <coord> elements but for some reason it expands the tables within those 2 columns in a weird way. Weird in the sense that for each <when> element it expands all of the <coord> elements for it. So if I have 1000 data points, I ended up having a total of 1000 X 1000 = 1 million rows. I referred to this blog post to try to expand the tables within the columns, but to no avail in my case.

Hence I did a workaround, a manual way of retrieving all the <when> rows and <coord> rows in two separate queries, then copying and pasting the combined data in a separate spreadsheet, finally saving it as a CSV file. Then I go back to PowerQuery to import the CSV file.

Next I need to retrieve my Fitbit data points. Good thing I already have a REST endpoint which does that for me. The REST API was implemented as a node.js app published at http://getfityall-api.azurewebsites.net/fitbit. I pass in the query strings which consists of the date and time range and VOILA!

Create a PowerMap, and I get this slightly different visualization below.

googlochistmashup So there you go, the results of VoluntaryTracking.Enabled = true; and then mashing your Google location history with Fitbit data.

The Internet of Things – drawing parallels between the past and now to predict the future

When I was dabbling with the Internet of My Things in my little hobbyist project, GetFitY’all, there were many tell-tale signs that I had experienced this before, like déjà vu. It prompted me to ask questions like what are the differences between the past and now, and could I possibly predict what the future lies for the IoT ecosystem in my most honest and humble opinion.

Devices had  always been connected one way or another. My earliest experience of mucking around with devices transports me back over 13 years ago during a time when I had a payphone sitting on my work desk. It was a big green device, so much so that it obscured the view of me dozing off behind my desk (how nice to doze off in office to get inspiration on how to get things right with the payphone :P). The payphone had IP connectivity, and my work as the tech lead in this project was to repurpose my company’s own pride and joy, a WAP browser called WAPman from being client-side to a server-side implementation that could send commands to the payphone and also listen to events from the payphone (such as key presses and when the handset is hung up). The communication protocol under the wire was ASN.1. The only way to debug the payload was to eyeball the  data structure which was encoded in hexadecimal numbers. *phew* am I glad I didn’t have to wear glasses after that project. I was treating the LCD display on the payphone as a remote display, I was rendering bitmaps and displaying text through specific data structures encoded in ASN.1 to be sent to the payphone. Some of the optimization involves sending the commands in batches so as to reduce network latency. My biggest issue in my server-side implementation was SCALABILITY!

pay-phone-1

Scalability was an issue because I had to learn within a very short time frame how to repurpose a WAP browser which was essentially a client app written in C++ and then to expose an API implemented in a socket server. The socket server must handle multiple connections from the payphones and scale out processes on-demand to meet the demands of the payphone connections. I certainly did not have any internet scale back then. I only had 2 Sun Solaris servers powering this up in the customer’s datacenter (DC). The good news was that the project was launched and I lived to tell the tale now. It may be a little exaggeration but I still remember the night when I almost froze in the customer’s DC. What happened was that the customer’s PM said there is no time to deploy my code to the servers the next day, I had to do it that night, so he swung by my office at Amoy Street in Singapore, and off I went in his MPV to their DC at Tai Seng Drive.

Fast forward to now, if I look at the state of things from a macro level, what remained the same is the following: we still have “things”, connectivity, server-side processes and people. However the pace of doing things are a lot faster. Let’s look at this and I’ll try to draw parallels at each of those perspectives

  • “Things” – devices have gotten a lot smaller and personal. From a communication device standpoint, who uses a payphone these days anyway? Do you even have a land-line phone at home? Things are very mobile, and the biggest challenge is battery life.
  • Connectivity – IP connectivity is king, but it is not to be taken for granted, connectivity can still be sparse, just accept that “things” could be occasionally connected. Things can connect via other protocols such as ZigBee, Z-Wave and most recently I heard about the Thread consortium. Connectivity could be one-way, or two-way either directly with the server-side processes or through a device gateway.
  • Server-side processes – the order of the day for server-side processes is that it must be scalable, secure, robust . It only makes sense to have less reinvention of the wheel here because a horizontal platform could make things easier and faster time-to-market for an ISV or SI to build a IoT vertical solution. If I had a good cloud-scale IoT horizontal platform, I could easily repurpose my WAP client app as the logic behind command & control module within a horizontal IoT platform/framework.
  • People – ultimately we are the reason why “things” exist, “things” to be connected, and “things” to be managed and “ruled” through some server-side processes. People pretty much determine if we want our “things” to be connected and produce insights and productivity in many ways to make our lives better. Along with that some of our resistance of course.

There are much more parallels that one could draw between the past and now. As for predicting the future, I’ll keep my humble opinions to a later post. What follows this post is a more technical post about another way in which I am mashing location history data points with my Fitbit activity data points.

So I want to be an IoT device maker….

So far so good, I’m pretty happy with the Internet of My Things (IoMT) as I got my wearable devices, and a smartphone packed with sensors through good fitness apps installed within. I’d been able to visualize it on a PowerMap. But my curiosity expands because wouldn’t it be good if I could “make” my own thing? I’d been reading, watching and hearing a lot about enthusiast DIY projects which are about programming your own electronics board. Sounds like a whole lot of fun. I’d been hearing Arduino, Raspberry Pi, Intel Galileo (I found out about the free giveaway board at Windows Developer Program for IoT a tad bit too late). So before I plunk in $50 of my hard-earned savings into a board, I must make sure I have the right board for my GetFitY’all project extensions. This article sums it up pretty well.

At this point, I’m more inclined towards a Raspberry Pi B+ board, but then there would be external sensors that I need to buy. Perhaps more important than this is I’m also thinking what would be a good extension to my project. Here are some of my initial ideas:

  • Extend from the world’s smartest bike light project. You can read about it here. When I ride it’s less about the speed, but more about watching my heart beat rate (HBR) so that it doesn’t exceed the so-called maximum HBR for my age which is a simple calculation of 220 – my age. There are other methods to calculate this but I usually set my max to be around 180, have some safe buffer just in case. I want to enjoy biking a lot longer than fainting and potentially killing myself faster, that’s one thought, though there are other reasons that could harm myself too. Other indicators that I like to see is elevation, and it would be good to find out how much I had climbed since the last moment that I almost gave up and wanted to just get off my bike to push. Knowing how much higher I’d climbed would be a good encouragement for me to continue pedaling to the top.
  • Have my board programmed to automatically alert my family that I’d finished my ride and that I’m riding or driving home, including ETA. If it’s winter, it may be good to get that heater warming the bathroom so that it’s all nice and warm when I take my shower or a hot tub bath. 🙂
  • Project turn-by-turn navigation on the ground as when I bike. But then this really works when I’m navigating at night, doesn’t it? So not too sure how practical this is yet.

Let’s pause for a while here while I gather more thoughts about the ideas and what seems more practical and fun to extend Project GetFitY’all!

IoT is an ecosystem play between partners

As I was alluding in my previous post about my thoughts about moving up the IoT value chain, the opportunities are aplenty in the IoT space because there is a healthy ecosystem of  different players from the device manufacturers, to SI/ISVs who provide the vertical solutions to the platform/infrastructure providers. Ultimately it is an ecosystem play. I’m not saying that there is no competition, surely competition exists to make the ecosystem a much more vibrant place. The best players I know are inspired by their competitors (as opposed to being delusional about their competitors) because it helps them to get their own act together and be serious in producing valuable innovation.

I have the pleasure of partnering with many great software players within my previous life as a technical evangelist. One of the earliest adopters of IoT was this ISV. The COO of this ISV was a visionary. He wore many hats so he’s not your typical COO who just wanted to focus on scorecards, KPIs, and traffic light indicators (oh I have a good joke about the watermelon effect but let’s save this for another post). He wanted to build an IoT practice. This was back in 2009, not that IoT didn’t exist then, but it was like something I read of some conference or symposium, I have not seen any solutions in this space at all.

He was super excited when he told me about the tones of devices out there that needs to be connected and sounding pretty philosophical about the Internet of Things. He talked about getting his engineering team to focus on building firmware for devices which essentially were agent software that connects to his IoT middleware. I was pitching the idea to him to leverage some Platform-as-a-Service (PaaS) offering in the cloud, and just focus on the core assets of his IoT practice; an IoT middleware and a buzzing professional services unit. Coincidentally I had a colleague from the product team who was coming to run a workshop and a few partner architectural sessions for my partners and he’s really an expert and guru in middleware, messaging systems and how to scale it out really well on the cloud. He was no other than the bloke behind this blog which I am following, Clemens Vasters. Mr COO was delighted and he had one and only request, we needed an NDA before we sat down in front of a white board! Woah this got exciting. I happened to be really good at getting partners to sign NDAs so we got one signed in no time! LOL

The session was intense I must tell you, sparing the details. But the stuffs which we were talking about, it’s all nothing new NOW. What’s available NOW is that a lot of infrastructure and platforms are horizontal in nature, and this allows an IoT practice to focus on their vertical solution’s time-to-market.

When I was building my IoT PoC, GetFitY’all, I was pleasantly surprised how much I could achieve in such a short time. I was able to connect to my devices and sensor apps, and harness the data through friendly self-analytics tools. I am missing other pieces for sure such as the ability for command & control of my devices, configuring the services via rules and workflow. That’s alright, I look forward to all the good stuffs I could leverage along the way as I extend the project.

 

PowerQuery invokes a GetFitYall API endpoint, and fun with PowerMap

Is that even possible? Yes and I’m talking about invoking that from within the PowerQuery add-in in Excel 2013, and then mucking around with the data which is represented in JSON. Pretty awesome I would think. To the layman, don’t worry about what’s this JSON thing, it’s all transparent to you, just consume the date.

In my previous post, I wrote about the REST API which I had exposed and it allows mashup on-demand which is perfect in the case of self-analytics using PowerBI. Here’s how you could consume this API from within PowerQuery specifically. It’s just another source of data like how I had retrieved the data from Azure Table Storage. Here are the steps:

1. From the PowerQuery ribbon, click “From Web”. Then enter the URL. The URL I’m entering is the REST endpoint I have. It should be HTTPS but then this is just a PoC so I’m keeping things simple here. I pass some query strings in the URL too.

fromnode

2. Click List which contains an array of mashed up activity data points. Do NOT click “Into Table”, at least not yet.

powerquery-step1

 

3.  Now that you have expanded the List into a row of records, click “To Table”.powerquery-step2

4. Then you see the following dialog box, just click OK. No worries, it’ll be fine.

powerquery-step3

5. Select All Columns.

powerquery-step4

6. Fix the data type for the fields you care about, especially those you want to be used to visualize in PowerMap. Start with datetime. Click the column header, then at the ribbon, select Date/Time as the Data Type.

powerquery-step5

7. Fix Steps column as well. Choose Whole Number. This is because you don’t have a fraction of a step, just steps. 🙂

powerquery-step6

8. Fix Calories column, and set Data Type as Decimal Number.

powerquery-step7

 

9. At the ribbon, click Close and Load To. Then this dialog box pops up. Be sure to tick “Add this data to the Data Model”. The data needs to be in the Data Model in order for PowerMap to work on it after this.

powerquery-step810. The results are a number of rows retrieved from the REST endpoint. Look man, no JSON 🙂

powerquery-step9

11. If you want to look under the hood, I happen to be “tailing” the log of my node.js Azure website. Here’s proof that it’s the same 3,014 rows being returned. It took some 7 seconds to execute, this is what I mentioned in my previous post that I might not have optimized the mashup logic.

powerquery-step9-1

That’s it on the part of PowerQuery. Let’s do the fun stuff of visualizing this on PowerMap.

1. Map the geography and map level by selecting the lat and long fields.

powermap-step1

 

2. Select the columns which we want to visualize in the PowerMap.powermap-step2

 

3. Change the width of the “skyscrappers” and the colors of course, and VOILA, you get this birds eye view of where the “action” happens. In this case, I walked the most around the Sydney CBD area. I attended the Mobile Monday Sydney meeting a couple weeks ago. powermap-step3

 

When I showed this to my wife the other day, she asked why were calories burned even when I was sitting idle in the bus. But then she answered her own question when she said “oh yeah we burn calories so as long as we are breathing!” LOL 😀

 

 

 

GetFitYall REST API

This API exposes a singular function at the moment which is to do the following:

  • Mashup on demand – Let client apps consume a mash up of fitness activity data points from different target APIs based upon user ID and time period for a specific date. Note: User ID is not implemented right now because my authorization website is not implemented fully yet.

Common usage of GetFitYall API
1. Self-service analytics tool such as PowerQuery and PowerMap pulls activity data points from HTTP/S endpoint(s) based upon query parameters such as user ID, date, and time period.

1. Get activities mash up

GET http://getfityall-api.azurewebsites.net/mashup?<userID>&ondate=YYYY-MM-DD&aftertime=HHmm&beforetime=HH:mm

E.g.,

http://getfityall-api.azurewebsites.net/mashup?ondate=2014-08-04&aftertime=10:00&beforetime=23:00

Description:

Gets a mashup of activity data points from different fitness APIs based upon user ID and matching timestamps. Currently supports Fitbit intraday API and Strava API. In order to get activity data points down to 1-minute detail level, this API function only works for a specific date as required by the Fitbit intraday API.

Query parameters
userID The Fitbit user ID which has been authenticated and authorized by Fitbit OAuth API
date The specific date from which to pull the Fitbit activity data points. This works with 1 specific day because the Fitbit Get Intraday Time Series function only allows fetching a time series for a specific day but the data points would be down to 1 minute detail for the day. See https://wiki.fitbit.com/display/API/API-Get-Intraday-Time-Series
afterTime The start of the period, in the format HH:mm
beforeTime The end of the period, in the format HH:mm

Returns: Content-type = application/json
HTTP status codes
200 – OK
400 – Error in request
500 – Error in processing

Example response:

[
{
"type":"fitbit_strava",
"date":"2014-07-20",
"data":[
{"datetime":"2014-07-20T10:00:00.000Z","latitude":-33.869406,"longitude":151.120498,"distance":12959.4,"altitude":8.5,"steps":105,"calories":13.890000343322754,"floors":0,"elevation":0},
…. omitted for brevity….
{"datetime":"2014-07-20T10:00:00.000Z","latitude":-33.869369,"longitude":151.120443,"distance":12966.1,"altitude":8.3,"steps":105,"calories":13.890000343322754,"floors":0,"elevation":0}
]
}
]

This API is implemented as node.js app and deployed into a free/basic Azure website from WebMatrix. The reason why I had chosen node.js is mostly to learn a new server-side technology. The other reasons are:

  1. More programmable for mashup logic.
  2. Leverage many third-party Node.js modules such as node-strava, node-fitbit to rapidly develop prototypes of the GetFitYall API.
  3. Scalable as I configure this as an always on  basic Azure website and scale out accordingly.
  4. Due to the lightweight nature of node.js, the node.js app can handle a large amount of traffic with low overhead

Obvious Bottleneck

One potential problem is the sheer number of mashup requests. When I invoked this endpoint from PowerQuery,  I saw that there were 3 requests for each query, kind of weird but I’m not keen to find out why. Multiplied by thousands if not tens of thousands of users, it is pretty obvious that this would become a serious problem. Recommendation to address this bottleneck as follows:

  1. Cache the HTTP Response

A simple solution is to cache the response for similar requests (based on the same query parameters). The main benefits of caching response are reduced latency and network traffic.

2.   Mashup On-demand Optimization

The node.js app makes asynchronous calls to the Fitbit API to retrieve steps, calories out, floors, and elevation because this is how the Fitbit API works. A Javascript promise is used to determine when all calls are returned before processing the next step. This is another benefit of using node.js and by further using a 3rd party module such as Q, the callbacks hell can be avoided.

IoT Descriptive Analysis using PowerBI

Now comes the interesting part which is self-analytics of all the data that I have collected from “the Internet of My Things” (IoMT). As a recap I am currently ingesting activity data points from 2 devices, a Fitbit One and a Samsung S4 running 2 “sensor apps”; Strava and MapMyWalk. But it shouldn’t be limited to this as I also have a Garmin Edge 705 with heart-rate monitor (HRM) to track my MTB rides and a Polar FT40 wrist watch also with HRM to track other activities such as badminton and swimming (yeah my one and only wearable device which works under water). I have a small disclaimer: I’m not a regimented fitness geek. I just want to make sense of my activities. It all started with mountain biking and I just want to know how often I ride and for how long to try to justify to my wife why I bought 2 mountain bikes! 🙂 During my rides, I wear a HRM because I just don’t want to over-exert myself during those steep climbs. When I looked at my dashboards I realized there is so much information which helped me to gain insights as to what I’m doing well and what not. It helps me to be better when I ride or play sports all without “killing” myself.

I chose PowerQuery and PowerMap, 2 very nifty PowerBI add-ins. I just want things to be simple and nothing beats self-analytics using a friendly tool like Excel (my wife is quite an Excel junkie from her previous life). These add-ins are available as free downloads from Microsoft to enhance the data access and data visualization capabilities of Microsoft Excel 2013. You should search for the latest download links. Using these tools I could retrieve data from a variety of sources and integrate that data as part of my Excel data model.

I’m particularly impressed that in the “Internet of my Own Things” the data generated were pretty sizeable. There were over 8000 Fitbit data points and 6000 Strava data points over a few days. And this is just for myself, imagine opening this up to more devices and more users? Obviously we needed a solution that is of cloud-scale to make this work. If you are doing self-analytics using Excel 2013, you may want to install a 64-bit version of Excel. Your Excel may crash working on all that data, I’d crashed the PowerQuery and PowerMap add-ins a few times, sent in feedback to Microsoft, they asked me if I could reproduce it, I say yeah when I work on huge datasets, they recommended I use the 64-bit version. Remember to download and install the corresponding 64-bit versions of the add-ins too.powerqueries

 

These data were retrieved from my Azure Table Storage which my Worker Roles diligently inserted (see my previous post). You could also import data from other sources which include Facebook, that’s pretty fun. Imagine being to compare my activities with my other buddies. I am a member of a couple of mountain biking groups in Strava. This could be a side-project later on.

azuretablesource

 

You need your storage account details such as the name and the storage primary key which you can get from your Azure management portal.

After I had retrieved data from 2 Azure storage tables which stored my Fitbit and Strava data points, I could “mash” them y’all. The function for this is merge within PowerQuery ribbon.  First I select the Strava table, and then the Fitbit table. This is because there are more data points in Strava that maps out my lat/long coordinates versus 1 Fitbit data point recorded at every minute interval. Then I select the datetimestamp column to match and I only want to include matched rows. I name my merged query as Getfityall. Voila, I had just mashed up both data sources without writing any code! I had tried to write the code to do the matching but I don’t think my code was all that good, I had nested for loops. I then tried to use JSON path but a JSONpath library I used was painfully slow. So guess what, I just let Excel do what it does best! However in a later post, I will talk about how to import the mashed up data in Excel by calling a REST endpoint that returns the mashed up data in JSON.  And in this implementation  I do have nested for loops written in node.js! (yeah please do LOL :))

Things to note when you merge the 2 tables in Excel. you have to adjust the column formats especially for date/time and steps (by making it a whole number). Otherwise PowerMap doesn’t understand the format of your data and would be unable to render it. Then remember to load the data into the data model. This is required by PowerMap.

Next you insert a PowerMap. It is not available as a ribbon on its own. Rather go to the Insert ribbon, and under Map, click on it. You will notice a Launch PowerMap option. Wonder why is such a powerful feature tucked away here.

insertpowermap

Create a new PowerMap tour and the fun begins. Select the latitude and longitude columns. It should automatically map correctly. Next I select other columns such as DateTime, Steps and CaloriesOut.

powermap-map

Be sure not to aggregate your columns under height. Otherwise you get weird-looking “skyscrappers”.  Next I configure the layer options by making the thickness smaller to about 25%, otherwise I get fat buildings and I can’t even see the route on the roads when I click play. I also changed the layer colors accordingly. I chose the national colors of Australia, green and gold to represent my steps and calories.

sceneoptions

Turn on the map labels. Change the playback speed. Pan, zoom in and zoom out and just play around with the map. Then create a video and it’s cool! You could even add a soundtrack!

There you go, self-service analytics of data captured from the Internet of my things. This is just descriptive analytics, I’m just visualizing data that I already have. Next you could advance to predictive and prescriptive analytics which opens up many more possibilities. In another post I will talk about my thoughts about how far we have progressed being able to derive value of out of your own IoT solutions and projects in such an accelerated manner. And the best part is that you only focus on what you do best without worrying about the underlying plumbing and infrastructure. It just works!

Family Funday Sunday


I’m going to jump the gun by just showing a sneak preview of my GetFitY’all activities mashup in the form of a PowerMap! I’ll describe what happens behind the scene in a later post. Meanwhile take a back seat and enjoy watching a PowerMap of my last Family Funday Sunday.


 

Scalable Event Hub Processor in an Azure Worker Role

So what happens after all those activity data point messages had been fired off into an Azure Event Hub? Event Hubs enable me to durably store these messages/events until the required retention period expires, which is 24 hours by default. I need to process those messages fast. That sounds like the starting point of big data where millions of data points have been ingested and these data need to be processed and for someone to step in to try to make some sense out of it. Enter a scalable event hub processor implemented as an Azure Worker Role. Fortunately the good guys at the Azure team already thought about this and someone published a a code project on how to scale out event processing  with Event Hubs. The EventProcessorHost simplifies the workload distribution model and I can focus on the actual data processing rather than handling the checkpoint and fault tolerance, the so-called “real enterprise architecture” stuffs which sometimes people tell me that I don’t know much about, but that’s ok. 🙂

First thing first, I could either implement this as another WebJob that is based on a custom trigger as when events/messages are ingested into my Event Hub.  But no, I wanted to try something else, hence I chose to host the Event Hub Processor in an Azure Worker Role. This also makes scaling sense because I could perform auto-scaling on the worker role instances based on the number of events/messages in the Event Hub. This is like deja vu, my favourite Azure demo which I did years ago was how to do auto-scaling based upon a queue metric. Each Worker Role hosts an instance of the Event Processor Host which processes events from all of the default 16 partitions in the Event Hub. The Worker Role upon OnStart() does the following to start the host. The rest of the code in the project above is used.

WorkerRole.cs code snippet:

public override bool OnStart()
{
consumerGroupName = EventHubConsumerGroup.DefaultGroupName;
eventHubName = CloudConfigurationManager.GetSetting("EventHubName");
string hostPrefix = CloudConfigurationManager.GetSetting("HostName");
string instanceId = RoleEnvironment.CurrentRoleInstance.Id;
int instanceIndex = 0;
if (int.TryParse(instanceId.Substring(instanceId.LastIndexOf(".") + 1), out instanceIndex)) // On cloud.
{
int.TryParse(instanceId.Substring(instanceId.LastIndexOf("_") + 1), out instanceIndex); // On compute emulator.
}
hostName = hostPrefix + "-" + instanceIndex.ToString();
StartHost().Wait();
return base.OnStart();

The processing logic is defined in the GetFitYallActivityProcessor class and specifically in an async method called ProcessEventsAsync(). The code appears below:

GetFitYallActivityProcessor.cs code snippet:

public Task ProcessEventsAsync(PartitionContext context, IEnumerable messages)
{
// here is the place for you to process the received data for futher processing.
// suggest you keep it simple fast and reliable.
try
{
List fitbitEntities = new List();
List stravaEntities = new List();
List getfityallEntities = new List();
foreach (EventData message in messages)
{
byte[] msgBytes = message.GetBytes();
var m = string.Format("{0} &gt; received message: {1} at partition {2}, owner: {3}, offset: {4}", DateTime.Now.ToString(), Encoding.UTF8.GetString(msgBytes), context.Lease.PartitionId, context.Lease.Owner, message.Offset);
Trace.WriteLine(m);
string type = message.Properties["Type"].ToString();
string _id = type + "_" + Guid.NewGuid().ToString();
switch (type)
{
case "Fitbit":
// convert an EventData message into an Azure Table Entity / row
var eventBody = Newtonsoft.Json.JsonConvert.DeserializeObject(Encoding.Default.GetString(msgBytes));
FitbitActivityEntity activity = new FitbitActivityEntity(message.PartitionKey, _id);
string[] dtstr = eventBody.Time.GetDateTimeFormats('g', CultureInfo.CreateSpecificCulture("en-AU"));
activity.DateTime = dtstr[7].ToString();
activity.Steps = eventBody.Steps;
activity.StepsLevel = eventBody.StepsLevel;
activity.CaloriesOut = eventBody.CaloriesOut;
activity.CaloriesOutLevel = eventBody.CaloriesOutLevel;
fitbitEntities.Add(activity);
break;
case "Strava":
// convert an EventData message into an Azure Table Entity / row
var seventBody = Newtonsoft.Json.JsonConvert.DeserializeObject(Encoding.Default.GetString(msgBytes));
StravaActivityEntity sactivity = new StravaActivityEntity(message.PartitionKey, _id);

It’s pretty straight-forward, just deserialize the “payload” from JSON into the same object which I constructed in the WebJob message pump.  Essentially this simple implementation just retrieves all the data point properties and especially make sure that the datetimestamp value from all my data points are on a consistent format. The reason for this is because I need to match these rows using a self-analytic BI tool, PowerBI is currently my tool of choice. Each data point is stored as a row into the Azure Storage table. Why? Because it’s pretty scalable and I could actually use an Azure Storage Table as a data source to be queries in PowerQuery. Good enough for me for now. A little bit of optimization is available in my code that performs the insert operations in batches. I borrowed this from some sample code I found but I couldn’t find it again to refer to it here. A batch operation may contain up to 100 individual table operations, with the requirement that each operation entity must have same partition key. That’s exactly what I did. Besides reducing the latency, this is also for monetary reason, it saves on the # of transactions to my Azure Storage. In the Internet of Things, I need to think about scale, remember? The end result, Azure storage tables storing all my activity data points, ready to be “mashup y’all”.

stravatable

 

fitbittable

So far so good. In my next post I will talk about the fun stuffs, using Excel beyond tracking my expenses in a spreadsheet, this is self-service analytics, Big Data style!

Using Azure WebJobs as an IoT data points ingestor

My GetFitY’all project has evolved again, obviously. Previously I intended to implement the message pump functionality as a RESTful endpoint that can be called automatically from some form of “cron job” in the cloud. But then I digress because a simpler approach could be used which is Azure WebJobs. It works well in my case because I already have a GetFitY’all Azure website which serves to provision users and to allow them to authorize GetFitY’all server-side to ingest data points from the various activity sources. My data points source have expanded to include MapMyFitness API besides Strava API and Fitbit API.

Implementing the WebJob is simple as I just pulled out the Azure WebJobs sample solution from ASP.NET CodePlex. I chose to implement a manual trigger which starts the method to ingest activity data points from my data sources. The code looks like the following:

static void Main(string[] args)
{
JobEnvInitializer();
JobHost host = new JobHost();
host.Call(typeof(Program).GetMethod("ManualTrigger"));
host.RunAndBlock();
Console.WriteLine("\nDone");
}


[NoAutomaticTrigger]
public static void ManualTrigger([Table("ActivityDatapointsProcessingLog")] CloudTable logTable)
{
GetFitYallAsync(logTable).Wait();
}

The data points ingestor cum message pump is meant to simulate a device gateway. And only if I could get my hands on the Azure Intelligent Systems Service limited beta, I could further use the ISS software agent to be embedded with my “device gateway”.  This is illustrated in the devices topologies below:

Source: https://connect.digitalwpc.com/Pages/SessionDetail.aspx?sessionId=9040ae74-c1b6-e311-8491-00155d5066d7Source: BD516 Building an Intelligent Systems Business with Microsoft Azure Services

Leveraging ISS agent on the gateway is good because all the IoT workflow would be managed by ISS including the sending of the data points as queue message. Currently  I have implemented some of the workflow on my own as some form of message pump (see below). Ideally I just want to focus on the simple device gateway logic of ingesting data points from multiple device APIs, and of course the fun parts of doing self-analysis of my IoT data points.

The “devices gateway” acts as a message pump to send each data point as a message via AMQP to the Azure Event Hub. This is meant to decouple the processing of these messages from the message pump. The Azure Event Hub is a highly scalable publish-subscribe ingestor that can intake millions of events per second and can handle huge throughput.

That piece of code looks like this:

// use RedDog.ServiceBus helper class to generate SAS
var sas = EventHubSharedAccessSignature.CreateForSender(senderKeyName, senderKey, serviceNamespace, hubName, deviceName, new TimeSpan(0, 120, 0));
var factory = MessagingFactory.Create(ServiceBusEnvironment.CreateServiceUri("sb", serviceNamespace, ""), new MessagingFactorySettings
{
TokenProvider = TokenProvider.CreateSharedAccessSignatureTokenProvider(sas),
TransportType = Microsoft.ServiceBus.Messaging.TransportType.Amqp
});
var client = factory.CreateEventHubClient(String.Format("{0}/publishers/{1}", hubName, deviceName));
// iterate through the 4 Lists and compose an ActivityDataPoint object and fire off the message using Event Hub
int cnt = intraDaySteps.DataSet.Count;
var procTasks = new List();
for (int i = 0; i < cnt; i++)
{
FitbitDataPoint adt = new FitbitDataPoint();
adt.Time = intraDaySteps.DataSet[i].Time;
adt.Steps = intraDaySteps.DataSet[i].Value;
adt.StepsLevel = intraDaySteps.DataSet[i].Level;
adt.CaloriesOut = intraDayCaloriesOut.DataSet[i].Value;
adt.CaloriesOutLevel = intraDayCaloriesOut.DataSet[i].Level;
var data = new EventData(System.Text.Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(adt)));
data.PartitionKey = userName;
data.Properties.Add("Type", "Fitbit");
procTasks.Add(client.SendAsync(data));
}
await Task.WhenAll(procTasks);
procResult.Success = true;

A FitbitDataPoint object is constructed for each data point retrieved from the Fitbit API. This is then serialized into JSON and added as EventData. The partition and a simple property to identify this message as a Fitbit data point is set to the EventData object. Finally a simple SendAsync() sends this to the Event Hub using  Microsoft.ServiceBus.Messaging.TransportType.Amqp. I am using RedDog.ServiceBus which is a really nifty library which you can install using NuGet. Full credit goes to Sandrino Di Mattia, I learned a lot from his blog post about IoT with Azure Service Bus Event Hubs.

Similar code goes for Strava and MapMyFitness data points. When I’m done,  I just have to publish this WebJob to Azure. This can be performed in Visual Studio itself but please make sure that you have installed Visual Studio 2013 Update 3. This is because the WebJobs deployment features are included in Update 3. I won’t describe the steps here because there is already a great article about How to Deploy Azure WebJobs to Azure Websites. Also check out Get Started with the Azure WebJobs SDK

publishwebjob

 

All is fine and dandy, I could execute my WebJob and the message pump works like a charm in firing off the AMQP messages to my Event Hub. But there’s a problem which I haven’t been able to troubleshoot, my web jobs always abort.

webjobs-aborted

 

Maybe it is due to what is described by Amit Apple that WebJobs only run continuously in an “Always on” Azure website and this is not available in the free and shared websites. But then my webjob is not meant to run continuously, it just runs and finishes upon firing the message pump. The WebJobs dashboard should display a successful run. *scratch head*

As a final confirmation that there’s indeed some “action” on the part of my Event Hub, here’s my GetFitYall Azure Event Hub dashboard.

eventhubdashboard