GraphStory helps AgSmarts enable the next agricultural revolution: The Agri Internet of Things

Over the last 200 years, the world has witnessed the agriculture industry make tremendous strides in producing more crops with less labor and overall better agriculture practices. The amount of labor needed to grow 100 bushels of corn has gone from 80 hours in 1850 to 2 hours today. In the same period, the yield for one acre has gone from 40 bushels to over 160 in good years.

Even with these advances in agriculture, farmers must continue to innovate to be successful. Farmers manage their fields while needing to understand dozens of variables, such as changes in weather patterns to the impact of seeding patterns. In addition, farmers in California – as well as other areas of the US – have also met with the harsh realities of drastic drought conditions. To that point, water scarcity is only one of the effects of a rapidly changing climate that is affecting farming.

High Tech in Agricultural Progress

AgSmarts, a two-year old startup based in Memphis, TN, is focused on helping farmers by providing unprecedented and specific crop information that, in turn, lowers their operational expenses and optimizes crop yields. AgSmarts’ Precision Ag includes moisture-sensing technology, predictive analytics, and farm equipment automation that represent an innovative revolution in data-driven agriculture.

Farmers are furnished with smart sensors that connect wirelessly to the cloud and collect & analyze data – from real-time weather data, crop and other variables in their fields. This data helps the crops tell the story about the impact of farm management descisions and changing conditions and, ultimately, can provide direction in order to produce greater yields.

agsmarts-img1

Big Data helps Agronomics

Wireless technologies and the internet of things have made it possible to create affordable field sensors that constantly measure critical data like temperature and soil moisture. Data collected by the sensors is uploaded to the cloud. AgSmart combines the sensor data with weather data, crop growth and many other data points and stores them in a graph database. Graph databases are optimized to quickly create insights from large datasets – turning data into actionable insights.

“Graph Story has been instrumental in getting our cloud application up and running in such a short time. Their hosted graph database gives us all the advantages of a scalable, lightning fast, modern big data solution without any of the hassles of maintaining it and worrying about scalability” says Clayton Plymill – CTO and co-founder of AgSmarts.

In partnering with AgSmarts, Graph Story has helped with the design and implementation of its groundbreaking application as well as hosting and support. Our service allows AgSmarts to focus on their customers and their core business as well as provides an affordable, scalable platform to help further their goals in providing amazing technology in agriculture.

To get started with the Graph Story platform, sign up for our free trial or contact us at contact us with any questions!

_______________________

About AgSmarts: Based in Memphis, TN, AgSmarts is a Precision Ag technology company that offers remote wireless sensing, predictive irrigation and crop management analytics, and equipment automation that collectively represent a revolution in data-driven agriculture. AgSmarts’ platform combines hardware and software solutions into a versatile, powerful and cost effective suite of tools that producers, researchers and agronomic consultants can use today in the struggle to conserve natural resources, control operational costs and maximize crop yields. For more information about AgSmarts, please visit www.agsmarts.com

 

Comparing Graph and Relational Databases

When comparing graph databases to relational databases, one thing that should be clear up front is data affiliation does not have to be exclusive. That is, graph databases – and other NoSQL options – will likely not replace relational databases on the whole. There are well-defined use cases that involve relational databases for the foreseeable future.

However, there are limitations – particularly the time as well as the risk involved to make additions to or updates to a relational database – that have opened up room to use alternatives or, at least, consider complimentary data storage solutions.

In fact, there are a number of use cases where relational databases are often poor fit for the goals of certain data, such as social relationships at-scale or intelligent recommendation engines. Overall, the limitation of how a relationship is defined within a relational database is a main reason to consider a switch to a graph database.

Also, industries of all types are seeing exponential data growth and the type of data that is growing fastest is unstructured. It doesn’t fit well in columns & rows – AKA the relational database schema.

Using a schema-less database model found in graph databases is a huge benefit for applications and the developers that maintain them. If the application can drive the data model, it fits better the with development cycle and can reduce risk when making model changes later.

The relationships in a property graph, like nodes, can have their own properties, such as a weighted score. With that capability, it would be relatively trivial to add this new property on a relationship. It’s especisally useful when the relationahip was not defined when the application was initially created.

For applications that use a relational database, this would be done by creating a join table – as it is known as a in the RDBMS world. This new table joins together two other tables and allows for properties to be stored about that relationship. While is a common practice, it adds a significant layer of complexity and maintenance that does not exist within the graph database world.

Yet another reason you might consider moving to graph database is to remove the work-arounds that must be used to make an application fit within a relational database.   As discussed the previous example, a join table is created in order have metadata that provides properties about relationships between two tables.

Often, new relationships will need to be created, which requires yet another join table. Even if it has the same properties as the other join table, it must be separate in order to ensure the integrity of the relationships.

In the case of graph databases, typed relationships can exist between more than just two types of nodes, For example, a relationship called “LIKES”, e.g.(person)-[:LIKES]->(book), can also be applied to other node types, e.g. (person)-[:LIKES]->(movie).   In fact, the relationship type could be applied between any of the applicable nodes in the graph.

Another reason to consider graph databases over relational database is what can be referred to as “join hell”. While creating the join can be relatively trial, those types of joins provide the least expressive data. Again, applications very often require joins over several tables. It is in this case that the big expense of joins begin to show – in both the development time and the application performance. In addition, if you wanted to modify the join query, it might also require adding more join tables – adding even more complexity to development and worse application performance.

Adding new relationships and the queries that represent them occur at application level. This removes a level development complexity and time and offer better application performance.

While the differences between graph and relational databases, there are a few similarities. A significant similarity is that both can achieve what is known as ACID compliance. It is these set of principles that guarantee that transactions completed by the database are processed reliably, which keeps data safe and consistent.

Why use a Graph Database?

What do LinkedIn, Walmart and eBay as well as many academic and research projects have in common? They all depend upon graph databases as a core part of their technology stack.

Why have such a wide range of industries and fields found a common relationship through graph databases?

The short answer: graph databases offer superior and consistent speed when analyzing relationships in large datasets and offer a tremendously flexible data structure.

As many developers can attest, one of the most tedious pieces of applications dependent on relational databases is managing and maintaining the database schema.
While relational databases are often the right tool for the job, there are some limitations – particularly the time as well as the risk involved to make additions to or update the model – that have opened up
room to use alternatives or, at least, consider complimentary data storage solutions. Enter NoSQL!

When NoSQL databases, such as MongoDB and Cassandra, came along they brought with them a simpler way to model data as well as a high degree of flexibility – or even a schema-less approach – for the model.
While document and key-value databases remove many of the time and effort hurdles, they were mainly designed to handle simple data structures.

However, the most useful, interesting and insightful applications require complex data as well as allow for a deeper understanding of the connections and relationships between different data sets.

For example, Twitter’s graph database – FlockDB – more elegantly solves the complex problem of storing and querying billions of connections than their prior relational database solution. In addition to simplifying the structure of the connections, FlockDB also ensures extremely fast access to this complex data. Twitter is just one use case of many that demonstrate why graph databases have become a draw for many organizations that need to solve scaling issues for their data relationships.

Graph databases offer the blend of simplicity and speed all while permitting data relationships to maintain a first-class status.

While offering fast access to complex data at scale is a primary driver for graph database adoption, another reason they offer the same tremendous flexibility that is found in so many other NoSQL options. The schema-free nature of a graph database permits the data model to evolve without sacrificing any the speed of access or adding significant and costly overhead to development cycles.

With the intersection of graph database capabilities, the growth of graph database interest and the trend towards more connected, big data, graph databases increase the speed of applications as well as the overall development cycle – specifically how graph databases will grow as a leading alternative to relational databases.

Uncovering Open Source Community Stories with Neo4j

Every dataset has a story to tell — we just need the right tools to find it.

At Graph Story, we believe that graph databases are one of the best tools for finding the story in your data. Because we are also active members of several open source communities, we wanted to find interesting stories about those communities. So, we decided to look at package ecosystems used by developers.
survey-search
The first one we tackled was Packagist, the community package repository for PHP. Nearly 20,000 maintainers have submitted over 60,000 packages to Packagist, which gives us a lot of interesting data to investigate.

How We Used Neo4j to Graph the Packagist Data

Collecting this data and getting it into Neo4j was relatively straightforward.

One HTTP endpoint on the Packagist site returns a JSON array of all the package names. We iterated over that, and made individual calls to another endpoint to retrieve a JSON hash for each package, which includes both base package data and information on each version of the package, including what packages a given version requires.

The data model for our initial version was pretty straightforward. We have three node labels:

  • Package
  • Maintainer
  • Version

and five relationship types:

  • HAS_VERSION
    (Package)-[:HAS_VERSION]->(Version)
  • MAINTAINED_BY
    (Package)-[:MAINTAINED_BY]->(Maintainer)
  • REQUIRES
    (Version)-[:REQUIRES]->(Package)
  • REQUIRES_DEV
    (Version)-[:REQUIRES_DEV]->(Package)
  • SUGGESTS
    (Version)-[:SUGGESTS]->(Package)

This certainly isn’t a complete schema to represent everything within the Packagist ecosystem, but it let us do some interesting analyses:

  1. What packages get required the most by other packages?
  2. What maintainers have the most packages?
  3. What maintainers have the most requires of their packages?
  4. What maintainers work together the most (packages can have multiple maintainers)?
  5. What are the shortest paths between two given packages, or two given maintainers

Our Findings

You can see our results so far at packagist.graphstory.com.

Some of what we found was expected: certain well-known open source component libraries get required the most, like doctrine/orm and illuminate/support.

It gets more interesting when examining maintainers, though. Some are high profile folks in the PHP community, like fabpot and taylorotwell, but some are people with whom we weren’t as familiar. It certainly made us re-examine what we thought we knew about the PHP community – it’s not always folks who are speaking at conferences that are making big contributions.

The shortest path analyses were interesting as well. There were a few packages that showed up in these paths over and over to tie together maintainers and packages, such as psr/log. “Keystone packages” might be a good term for these, because they seem to join and support the PHP open source community again and again.

A Cypher Example: Finding Top Maintainers by Packages

Here’s one example Cypher query we ran to find the top Packagist maintainers by package count:

MATCH (m1:Maintainer)<-[:MAINTAINED_BY]-(Package)
WITH m1,COUNT(*) AS count
WHERE count > 1
WITH m1,count
ORDER BY count DESC
RETURN m1.name as name, count
LIMIT { limit }

See the results of this query and others on packagist.graphstory.com.

Why We Used a Graph Database

Much of what we’ve done would be possible with an RDBMS or a document database, so why do it in a graph database – specifically Neo4j?

We found three major upsides while working on this project:

  1. It is so much easier to map out data and relationships. Making relationships in RDBMSes work, even in simple cases, is harder, and significantly more difficult to change down the road. Compared to popular document databases, Neo4j relationships are done in the database — we don’t have to maintain them with application logic.
  2. Discovering how people and packages are connected is much easier and faster than with RDBMSes and popular document databases. Cypher and the graph model makes it easy to get the data we want without complex SQL joins or a wrapper script in another language.
  3. Trying new queries to explore the data is so convenient with Neo4j’s web interface. It’s quick and easy to prototype and profile from there, and then copy and paste the Cypher into your app.

We’re obviously big believers in graph databases at Graph Story, but this is a fun project that highlights a lot of the advantages of Neo4j. We found a number of interesting stories in Packagist, and there are certainly more to uncover.

GraphConnect Europe 2015: (graphs)–[:ARE]->(everywhere)

We couldn’t agree more with the theme for this year’s GraphConnect in London: Graphs are everywhere!

Graph databases are the future, and Neo4J is clearly leading the pack. As a Neo4j partner, we are excited about the road ahead and building out a amazing Graph Database as a service platform.

Neo4j database ranking chart

So to everyone at Graph Connect this week: We hope you’ll enjoy all the sessions and make sure not to miss the closing keynote “Impossible is Nothing with Graphs” by Dr. Jim Webber. You will be sure to enjoy his insights!

Graph Story selected for SXSW Accelerator Competition

We’re excited to share that Graph Story was selected to participate in the Enterprise and Smart Data Technologies category for the 7th annual SXSW Accelerator competition presented by Oracle.

The SXSW Accelerator competition is the marquee event of SXSW Interactive Festival’s Startup Village, where leading startups from around the world showcase some of the most impressive new technology innovations to a panel of hand-picked judges and a live audience. Hundreds of companies submitted to present at SXSW Accelerator, where Graph Story was selected out of 48 finalists in six different categories.

The two-day event will be held the first weekend of SXSW Interactive, Saturday, March 14 through Sunday, March 15, on the sixth floor of the Downtown Austin Hilton. The pitch competition will then culminate with the SXSW Accelerator Awards Ceremony on Sunday evening, March 15, where winning startups from each category will be announced and honored.

“Over the past six years of companies competing in SXSW Accelerator, more than 50 percent have gone on to receive funding in excess of $1.7 billion and 12 percent of the companies have been acquired,” said SXSW Accelerator Event Producer, Chris Valentine. “This year’s finalists have all demonstrated the capability to change our perception of technology and we expect to see them achieve similar, if not greater, success than our past finalists.”

The Accelerator competition will feature finalists across categories including Enterprise and Smart Data Technologies, Entertainment and Content Technologies, Digital Health and Life Sciences Technologies, Innovative World Technologies, Social Technologies, and Wearable Technologies. SXSW Category Sponsors include Rackspace, Enterprise and Smart Data Technologies Category Sponsor; and IBM, Innovative World Technologies Category Sponsor and Dyn, Social Technologies Category Sponsor.

Graph Kit for Ruby Part 3: Neo4J, Spree – Engine Yard deployment

Welcome to the third and final installment of the Graph Kit for Ruby post series.  Part 1 kicked the series off with a brief look at the idea of a graph database and some description of the Spree online store I planned to enhance with a graph. Part 2 went in depth with the addition of a graph-powered product recommendation system to a Spree online store.  In this final entry we’ll learn how to tweak our Spree + Neo4j store to deploy it to a production server on Engine Yard Cloud.

Provisioning

Engine Yard deployment of the Spree application worked in three major phases: provisioning the server, configuring the server, and pushing the code.  That runs for ten minutes or so, and then you have a new server running.  Next up – SSH into the server to do the last-mile config before your first deployment.

Oops!  My new server didn’t have my SSH keys and I couldn’t figure out an easy way to get them installed after the provisioning.  Since I was still in a happy prototyping mode I just deleted the server and then uploaded my SSH keys to my Engine Yard account under Tools -> SSH Public Keys -> Add a new SSH public key.  You’ll want to do the same if you’re following along at home.  If you don’t have a key yet, I recommend GitHub’s explanation on what SSH keys are and how to get one.  Once you’ve got your keys uploaded you can safely move on to the ‘boot a new server’ part of the Engine Yard setup.

engine-yard-panel

The Engine Yard cloud servers look to be hosted somewhere on Amazon Web Services.  Once I got my keys sorted out I created a new application in the control panel and named it graphkit_ruby. I chose some pretty standard Rails app defaults – the latest available version of Ruby, the latest available version of Postgres, and Passenger as the web server.  Engine Yard does offer SSL for real production stores but I didn’t bother since I’m not planning to sell these fake pet products.

Configuration

Using environment variables for configuration on Engine Yard

Engine Yard’s provided us with an app server and an RDBMS which covers the basics of Spree.  To get our new graph-powered recommendation engine running we’ll also need access to a production graph database.  I signed up for a free trial database from the Graph Story front page.  To integrate our external Graph Story Neo4j database with Engine Yard we’ve got to have a nice safe way to store our database credentials and pass them to the Rails app at boot time.  I’ve gotten in the habit of using environment variables to configure my production applications so I can keep such secrets out of the codebase.  Newer versions of Rails support this practice with the addition of a secrets.yml file, but in this case I found it easiest to just use my own custom.env file with the dotenv gem.

To do the same for your app, add the dotenv gem to your Gemfile and then you’ll be able to read environment variables from a text file at run time. This wound up working well with Engine Yard – I just put the file in a shared config folder that is consistently available to the app from one deploy to the next.

We’ll force Rails to load our environment variables from config/env.custom at boot time by setting up a config/boot.rb file that preloads our variables:

boot.rb

I .gitignored this file full of configuration and secrets so the one I’m using locally won’t be automatically pushed to Github or to Engine Yard. We’ll push it up to our EY server with scp:

scp config/env.custom my-ey-server-name:/data/my-app-name/shared/config/

Note that I was able to omit the full EY login string because I have my EY server hostname and credentials set up locally in ~/.ssh/config. If you don’t do that you’ll have to spell out the connection info like scp filename deploy@ey-server-ip-address:destination-folder/ instead.  That shared config directory is automatically symlinked into the config subdirectory of each new deployment to EY.

unnamed

To integrate this custom environment setup with Rails I went ahead and created a custom Neo4j initializer file for Rails that teases apart a database URL-style configuration into the sort of thing that the Neo4j gem is actually looking for.  This means that I can punch in a NEO4J_URI variable of the form https://username:password@ autogenerated-hostname.do-stories.graphstory.com:portnumber and Rails will automatically connect to my remote database.  With a fallback of localhost:7474 we can seamlessly switch between local Neo4j in dev mode and our actual Graph Story hosted database in production.  Speaking of which, you’ll want a free hosted Neo4j database of your own.  You can of course sign up from the Graph Story home page. Here’s what the connection info looks like from within my Graph Story admin panel – I copied the server connection information from here into my custom.env and formatted it into a NEO4J_URI string that I configured Rails to recognize via my Neo4j initializer file.

graph-story-panel

Creating a production secret token to sign cookies

Rails 4.1 uses a secrets.yml file that is .gitignored much like our above env.custom to hold production secrets. I have never messed with those myself but I did notice that it was looking for ENV["SECRET_KEY_BASE"] to set a production secret token for signing sessions. Let’s go ahead and generate one of those and tack it on to the production secret file we already created and then we’re (almost) in business.

secret-key

Deploying the code and seeding the database

Setting the production secret was the last step in getting my EY environment to play nice with Spree!  From there I clicked the “deploy HEAD” button in my EY panel and it pulled up the latest code from the Graph Kit Ruby repository on GitHub.  Once the code was finally deployed and the app was running I went into my Spree console and ran my database seeds to get an admin user created and to gin up all of those pretend products for our sample data.  That’s RAILS_ENV=production rake db:seed from within your app’s deployment directory on the server. Mine was /data/graphkit_ruby/current as shown in the secret key screenshot above.

finished-store

Next Steps for a Real-World Project

Asynchronous Data Processing

For a high performance production application you wouldn’t really want end users to wait for the round trip between Engine Yard and Graph Story every time we log a new purchase event to the graph. It’d be much smoother to use a background job to send that data over. I’d use Sidekiq if this were a client project – It’s a great Ruby library for background job processing and it comes with a nice job status visualizer.  By offloading offloading the graph writes to a background job you allow the web app to respond that much faster.  It’s common to do the same with transactional emails and any post-order processing in a high volume Spree site.

Richer Recommendations

Once you get started down the road of tracking purchase events you quickly realize there’s lots of other data you can start tracking to use for better recommendations. Here’s a few ideas: “Users who looked at this also looked at that”, “users in your area also purchased this”, “users who bought this often by that in the same order”.  You can also look at copying more of your product and user metadata over to your graph nodes in order to query on product characteristics or user demographics.

Now that you’ve seen how straightforward it is to model nodes and relationships with Neo4j you can imagine how you might start layering your own user location data or per-cart data into your graph for richer recommendations.  I hope you’ve had as much fun reading this series as I did writing it!

Graph Kit for Ruby Part 2: Bootstrapping a Spree Store with Neo4j Integration

Welcome to the second installment of the Graph Kit for Ruby series which covers setting up a Ruby project and Neo4J integration.  In the first post I described the plan for the project  to showcase the ease of use and business value of graph databases in the context of a Ruby project.  Today we’re digging into the implementation of a Neo4j-backed product recommendation engine.  This recommendation engine will sit atop an online store built with the Rails-based e-commerce project Spree.

Brief note on prerequisites and recommended experience level

This post assumes that the reader has a basic understanding of the Ruby ecosystem.  You’ll need a newer version of Ruby (2.0+ preferred) and the bundler utility.  You might also need to be able to install some packages on your system like the PostgreSQL and Neo4j database engines.  I used Ubuntu to prototype this but you should be able to get by fine with any recent version of Linux or OSX.  The finished project is published on Graph Story’s GitHub account as the Graph Kit for Ruby, so feel free to dig in for specific implementation details.  If at any point you get stuck feel free to reach out to the Graph Story team for help.  Here we go!

Starting a new Spree Project

You’ll want to follow the recommendations in the official Spree Getting Started Guide to get your workspace set up.   Note that you’ll start with the usual rails new installer before getting into Spree-specific setup.  Here are the basic shell commands to get your Rails project started with Spree loaded into its Gemfile:

# get some gems
gem install rails -v 4.1.6
gem install bundler
gem install spree_cmd

# make a new rails project
rails _4.1.6_ new graph-kit-ruby
cd graph-kit-ruby

# install spree into the gemfile and run its generators
spree install --auto-accept

Additional Gems

Look through the completed project’s Gemfile on GitHub and you’ll see several other important gems:

  • neo4j. This Object-Graph Mapper (OGM) gem aims to provide full-featured Neo4j access with an ActiveRecord feel. Rails devs with minimal Neo4j experience should appreciate the familiarity.
  • pg  You’ll need an RDBMS for Spree, and I prefer PostgreSQL.
  • dotenv  I’m going to deploy this to Engine Yard, and dotenv came in very handy. More on that in post 3.
  • jazz_hands  This one provides pry and several other convenience tools for working at the Ruby console. I find Pry in particular to be very helpful when learning a new library such as neo4j.

Building a Recommendation Engine with Neo4j

My idea here was to model Users, Products, and Purchases in a graph so that we could easily identify purchasing patterns to use in our recommendations. Spree already has a Spree::User and a Spree::Product model. Purchases are modeled in the RDBMS as Spree::LineItem rows that associate a product with a specific order. I’ll create a Recs module inside my project and give it User and Product models that are linked together by purchase histories.  The Product model is where most of the graph action happens, so let’s focus there.

Designing a “product” node type using the neo4j OGM

recs-product-rubymine

Spree gives us a products table in Postgres out of the box, so what we want to do is set up a Product node in Neo4j for each product row in Postgres.  With a bit of work we can ensure that each new Postgres product automatically creates a matching graphed product node.  I created a :graphed method on the Spree::Product model that finds or creates a matching Product in Neo4j on demand by calling into the self.from_spree_product(spree_product) method in the screenshot above.

Important things to notice in the Recs::Product model

  • Each node has a slug property – this is the unique identifier Spree uses in product URLs and in its relational database to distinguish one product from another.  Adding that to the companion node ensures I can link products from one database to the other and back.
  • The model class gets most of its functionality by including Neo4j::ActiveNode.  This gives us ActiveRecord-like semantics for finding, creating, updating, and removing nodes.
  • It has_many purchases as connections to User nodes.  This is where the edges in our graph come from and it’s how we’ll make meaningful queries against our graph.

Breaking down a Cypher query generated with the neo4j gem

The most interesting thing about this product model is of course the Cypher query (Cypher is Neo4j’s declarative query language) that allows us to surface related purchase data.  Let’s break it down line by line:

This query is executing in the context of an already selected Product node. We’ll refer to this starting point in the query itself as :product.

query_as(:product).

Identify all products which have been bought by users who have also bought the :product this query is built on.

  match("product<--(user:`Recs::User`)-->(other_product:`Recs::Product`)").

Discard products in the result set that are equivalent to the initial :product.

  match("product<--(user:`Recs::User`)-->(other_product:`Recs::Product`)").

Limit our results to a few products so that Neo4j and Spree can spit out results faster.

  limit(limit).

Return an array of unique products that match the other_product node in our match statement. This means any products bought by people who bought :product should be a valid result.

  pluck('DISTINCT other_product')

If you squint at that query you can see a sort of (Product)<–(User)–>(Other Product) relationship going on in the match statement.  “Queries that look like whiteboarded graphs” seems to be a design goal of the Cypher query language used in Neo4j. As a new user I can say it is pretty easy to get started with.

Automatically logging new purchase events as connections in our graph

Once we implement our Recs::User and Recs::Product model with their ‘purchase’ connection type, all we need to do is automate the logging of purchase events from Spree and PostgreSQL over to our Neo4j database.  Here’s how to do that:

def log_to_graph
  return unless user.try(:graphed)
  user.graphed.purchases << product.graphed
end

Let’s make some fake yet interesting purchase history data

In order to demonstrate Neo4j’s ability to easily unearth interesting connections in our data set, I decided to create some pretend customers with very consistent purchasing habits. For instance, a Mr. Green might only buy green products, while a Ms. Pillow might buy any pillow in the store regardless of its color. You can see the methodology used to generate this sample data in the graph-kit-ruby repository on GitHub.

fake-purchases

Inspecting the data in the Neo4j browser with the Cypher query language

One of my a-ha moments when getting into Neo4j was discovering the built-in web server and its visualization tools. After you’ve stuffed some nodes and relationships into your dev database you can visualize them by poking around at http://localhost:7474.  You can click on the “purchases” relationship to see a visualization of the entire product purchase history graph.  I wanted to dig in a bit deeper though so I got my Recs::Product model to give me some Cypher help.  You can learn more about Cypher on Neo4j’s site.

ruby-to-cypher

Using only the neo4j gem’s built-in methods and the Cypher syntax we covered above I’ve isolated a single Product node and gotten a good lead on how to look it up by hand using my own Cypher query.  Note the :to_cypher method in the screenshot above which generates a working query from your Ruby code just like :to_sql in ActiveRecord.  Unfortunately for me pasting that directly into the Neo4j browser didn’t quite work, but it got me close enough.  I tweaked the WHERE clause to look for product.slug = 'red-shirt' rather than the parameterized product_id query :to_cypher gave me and then I added RETURN product, user, other_product to the end.  Once I’d fixed up the Cypher I was able to get a neat visualization of the red shirt, the one user in my test data who’d bought it, and all the other things that user purchased.  Shirts, all shirts!

red-shirt-graph-in-browser

Integrating our product recommendations with the Spree storefront

Now that we’ve generated our sample data and figured out how to query Neo4j for simple product recommendations, let’s add them to our storefront and call it a day.  I wired up the product recommendations directly into the Spree::Product model as :users_also_bought.  That delegates to the :users_also_bought method from the relevant Recs::Product node and returns the first three results.  Armed with that easy lookup I dropped a new section into the product detail view with a _users_also_bought.html.erb partial template:

<% if (products = product.users_also_bought).any? %>
  <div class='users-also-bought'>
    <h3>Users also bought:</h3>
    <%=render partial: 'spree/shared/products', locals: { products: products } %>
  </div>
<% end %>

My favorite thing about this partial is that it managed to leverage a built-in Spree products partial, and all I have to do is pass it a local variable named products to which I’d assigned the results of product.users_also_bought.  There’s really nothing going on here other than looking up the data and passing it along to the built-in.

product-recommendations-in-place

Final post: Deploying our Ruby graph kit to Engine Yard and Graph Story

For the third and final post in this series we’ll cover the sysadmin work required to deploy your working graph-enhanced Spree site to production.  We’ve chosen to deploy to Engine Yard Cloud, so most of the post will focus on configuration specific to their environments.  You’ll also see how to switch from a local Neo4j server in development to a production-ready Graph Story server by layering in the appropriate connection strings.

Ruby Neo4j Part 1: Let’s Build a Recommendation Engine for a Spree Store

Hi there! My name is Daniel and I’m a consulting Ruby developer here in Memphis, Tennessee. The past year or two I’ve mostly focused on building online stores.  At Graph Story’s invitation I recently set out to build a Ruby project demonstrating a simple yet valuable integration with a graph database.  The little I knew about graphs before this project was all theoretical so I decided to start by learning more about Neo4j, the open source graph database technology behind Graph Story’s service offerings.  Poking around online to find graph database resources I wound up at Neo4j’s homepage and found that some of its founders published a book with O’Reilly. They were giving away copies in exchange for email addresses. Sold! The book mentioned several good use cases for graph databases, but recommendation engines jumped off the page for me.

You’ve undoubtedly seen automated recommendation tools in action around the web.  The first one that came to mind for me is the one at the bottom of every product page on Amazon saying “Customers Who Bought This Item Also Bought…“.  Turns out we can make our own fairly easily with a graph database!

other-recommendation-suggestions

What will our graph kit starter project do?

We’re going to have an online store with Ruby & Neo4J.  It’s going to sell pet products in various colors and product families.  You might see a pink leash, a yellow collar, or a red pillow.  We’ll whip up some artificial customers with interesting buying patterns and have them purchase a few hundred things from our online pet supply store.  Once that’s done, we’ll be able to dig in to the graph our purchase activities have generated and uncover some useful data about customer purchasing preferences.  Armed with this new data, we’ll set up an Amazon-style “customers who bought pink pillows also bought pink leashes” pane on every product page of our store in hopes of increasing sales with relevant recommendations.

pink-shirt-graph

You can see in the image above how a pink shirt appealed to one customer who collects pink things but it also appealed to another customer who collects shirts.  Digging these connections out of our graph will make it easier for us to present customers with targeted recommendations of things they might actually want to buy.

Where can I get the code?

All code for this project is shared on Graph Story’s GitHub account: The Graph Kit for Ruby is a mostly stock Spree store with some simple Neo4j integrations that will give you a head start on building your own high-value graph solutions.  If you read the source code closely you can probably figure out where we’re going in posts 2 and 3 in this series 😉

How will we deploy it?

We’re going to use Engine Yard to run our app server and a PostgreSQL database to store the basic product catalog and order history.  New products and new orders are also going to be streamed directly to a Neo4j database so that we can build our recommendations in real time.  We’ll take a fairly naive approach to this streaming in the interests of keeping this series of posts easily digestible.

Up next: building the sample store and integrating it with Neo4j

Stay tuned for post 2 in this series: “Bootstrapping a Spree Store and Integrating With Neo4j” where I’ll walk through the steps required to create your own Spree store, load up those fake customers, products, and purchases.  Once that’s in place we’ll layer in our “customers also bought” widget on the Spree frontend.  For post 3 we’ll walk through the process of deploying the working Spree app to an Engine Yard cloud server that integrates with a free trial graph database provided by Graph Story.

Moving Beyond the Social Graph: Neo4j Graph Database examples for ad-hoc analysis

If you’ve done any reading on graph databases, then you have likely come across the example of how graph databases can illustrate connections in a social media context. While graphs are excellent at handling social connections, this model just scratches the surface on the variety of use cases that can be addressed within a graph database.

Built-In Ad Hoc Analysis

Graph databases are an excellent match for experimental – or ad hoc – analysis. While there are many features in Neo4j that enable this type of predictive analysis, we will show two built-in features in the Neo4j browser to perform an investigative analysis, including data visualizations and implementing and removing relationships in a graph schema.

Seeing is Believing

When it comes to understanding data, few tools offer the advantages of data visualizations like Neo4j. While the Neo4j browser does offer a tabular view of the data, the visualization features create a more intuitive data representation for data analysts and will reduce the critical time-to-insight factor of data analysis. Consider the following examples of how data can be illustrated with the out-of-the-box utilities in Neo4j.

During a recent graph implementation to help law enforcement analysts, we can see how theft incidents were related by location and time. In a few simple steps, we can illustrate these incidents with visualizations that enabled analysts to summarize the events and make changes to patrol configurations. Here are a few Neo4j graph database examples

usecase1

From the illustration above, we can see how analysts were able to quickly see the difference in theft activity between days (2 on 4/20 & 6 on 4/7 ). Using the same visualization, we changed the presented label of the data to expose the value of the theft activity.

usecase2

From this illustration analysts were quickly able to see that $290 worth of bikes were stolen on 4/20/2014. When we added a relationship between events to not only look at thefts on the same day, but also the same hour, the following visualization showed analysts more detailed information about when these events occurred.

usecase3

From this illustration analysts were quickly able to see that on 11/1/2010 at 2:00 pm, 3 theft events occurred. Finally, consider the following example where a relationship was added to the data to match events which occurred at the same latitude and longitude.

usecase4

The illustration here shows theft events totaling $320 at LNG: -75.059492750000004 and LAT: 40.047509900000001. This data eventually correlated with a geo location database and an exact location was determined to be of high risk and a candidate for increased patrols.

Experimental Connectedness

As demonstrated above, adding relationships like time and space to data is a powerful tool to experimental analysis for law enforcement. Being able to correlate when and where events take place allows these agencies to be proactive rather than reactive to theft events. So let’s take a look at how Neo4j enables this on-the-fly analysis.

With just 3 lines of code, we can add the relationship between theft events based on the theft date property of the theft node.

Create Theft Relationship Based on Date

MATCH (t:Theft), (u:Theft)
where t.THEFT_DATE = u.THEFT_DATE
create UNIQUE (t)-[:STOLEN_ON_SAME_DAY]->(u)

In this example, we first query the graph to find all theft events have the same date and then create a relationship labeled STOLEN_ON_SAME_DAY to the data. In seconds this operation made it possible to correlate these events and produce the visualization provided above.

Create Theft Relationship Based on Date and Hour

MATCH (t:Theft), (u:Theft)
where t.THEFT_DATE = u.THEFT_DATE AND t.THEFT_HOUR = u.THEFT_HOUR
create UNIQUE (t)-[:STOLEN_ON_SAME_DAY_AND_HOUR]->(u)

Just by adding an additional AND condition to the statement for stolen on the same day, we were able to drill deeper into the relationship between theft events and uncover events which occurred during the same hour (represented by the :STOLEN_ON_SAME_DAY_AND_HOUR relationship).

Finally let’s look at how we setup the relationship which correlated theft events which took place at the same latitude and longitude.

Create Theft Relationship Based on Geo-Location

MATCH (t:Theft), (u:Theft)
where t.LAT = u.LAT AND t.LNG = u.LNG
create UNIQUE (t)-[:STOLEN_AT_SAME_LOCATION]->(u)

Again with a simple where statement, we were able to link events by geo-location and represent the relationship STOLEN_AT_SAME_LOCATION.

Looking Ahead

From these cases, it should be clear that graphs can offer two very easy ways to analyze more than just social media data. After all, being able to relate data points by spatial and temporal elements is applicable to more areas than just law enforcement. From a retail perspective, this can lead to illustrations of product demand by store location and time of year. Apply the same relationships to customer purchasing, and you can gain insight into when and where customers are buying what products.

When you factor in the relative ease and speed at which this type of analysis can be performed, using graphs to perform analysis is a low risk/high reward exercise.
While social media projects get much of the limelight in graphs, the real power of graphs is being able to move beyond relational database constraints and putting the analytical strength of Neo4j to use.

with use case work and code provided by William Sharp