Category: Graph Use Cases (Page 1 of 2)

Graph Story Newsletter: Much Love for Beginners 💚💜💙💛

This week is all about BEGINNERS! Here are the best resources we could find for folks just starting out with graph databases and Neo4j.

What the heck is a graph database?


We created a short series on the what, why, and how of graph DBs, aimed at people who have never touched one. It’s important to get the basic concepts, but so many articles focus on big picture stuff (BIG DATA AND MACHINE LEARNING AND NETWORKS AND STUFF). Our articles make it easy to understand what the graph is, and how to make it work.

Getting Started with Neo4j


This tutorial is a nice intro to Neo4j by Neo Technologies. It’s essentially the same thing as the built-in :play movie graph command in the Neo4j Web UI that you get with every Graph Story instance, but this has more explanatory content to go along with it, as well as exercises and solutions.

Exploring networks with graph databases


Data journalist Leila Haddou has a great tutorial for beginners that focuses on exploring data relationships with Neo4j. It steps you through very basic Cypher, and importing CSV datasets into Neo4j, then finding connections that would be difficult or impossible to discover in a relational DB.

Neo4j Flask


Data scientist Nicole White created a small microblogging app powered by Neo4j and the Flask microframework for Python. This is a great example of seeing Neo4j and Cypher in a real, working app situation.

Cypher Cheat Sheet


They call it a “Refcard,” but it’s more fun to call it a cheat sheet. I’m always forgetting how to do this or that in Neo4j, and having this quick reference close by is super handy.

• • •

Found other awesome resources for graph dbs or Neo4j? Let us know!!

Until next week,

Ed Finkler
CTO
Graph Story
http://graphstory.com
@graphstoryco
765-374-5388

Graph Story Newsletter: 10% Off Any New Instance 💰

Have we mentioned how much we love you lately? We love our customers, because you’re the best. Because you’re so great, we’ve got an awesome offer for you!


ONE WEEK ONLY: Get 10% Off ANY New Graph Story Instance for 12 months


We have the best prices in the graph game, but we love you SO MUCH that we’re gonna do you a real solid. 10% off any new instance, any price, any provider, for 12 months. That’s the kind of awesome stuff you get when you subscribe to the GS Newsletter. Only the link in this newsletter will work – accept no imitations.

10% off any new instance for 12 months

Offer expires in one week!

Create a CMS with Neo4j & Elasticsearch


John David Martin from Unicon has a cool post on the Neo4j blog about creating a CMS with Neo4j and Elasticsearch. In this case, he demonstrates how he created a tool to provide personally relevant content via text search, utilizing the Page Rank algorithm to score results.

Natural Language Processing Made Easy


Another William Lyon article on pulling data from an API and doing analysis on it caught my eye, this time about grabbing data from Best Buy product reviews and analyzing the text for opinions. William’s article Building a text adjacency graph from product reviews with the Best Buy API shows how you’d grab the data, but then it turns over the subject of doing natural language analysis to a couple other blog posts.

The one I liked the most was Natural Language Analytics made simple and visual with Neo4j by Michael Hunger, which does a great job of demonstrating how you split up natural language into a graph structure, and then do stuff like find the most important phrase of the text.

William also pointed to Max de Marzi’s post Summarize Opinions With A Graph. It’s from a few years ago, but it’s still very relevant, focusing primarily on the concepts used in breaking down natural language into analyze-able graphs.

Internationalization with CypherMessageSource, Spring and Neo4j


Eric Spiegelberg has a guest post on the GraphAware blog about internationalization powered by Neo4j. This focuses on using the Spring framework and an implementation of the MessageSource interface that retrieves message definitions from a Cypher-powered graph like Neo4j. I’d love to see examples like this for other stacks.

• • •

Are you doing something cool with graphs at Graph Story? Let us know! We want to talk about what our customers are doing here in the newsletter.

Until next week,

Ed Finkler
CTO, Graph Story

Graph Story Newsletter: Graphs Make Us Safe From Tainted Burritos

I stopped eating burritos from a major chain a while ago because I didn’t want to die, but they pulled me back in with their brown rice and delicious barbacoa. As we learn below, I chew a little more confidently now, knowing graphs are watching out for my food safety. Also you should buy a Graph Story instance now.

The benefits of graph databases in relation to supply chain

This headline sounds like some business nerd mumbo jumbo, but it has real benefits when you’re getting your burrito on and don’t want to get food poisoning. Chris Morrison, CEO at Transparency One, writes about the need for food brangs to track supply chains, so they can know about potential disruptions or safety issues:

Graph database technology presented itself as a viable solution. That’s because graph databases recommend themselves as being uniquely well qualified at handling large, highly-connected volumes of data at scale.

There are a number of graph databases available, and as an example the Neo Technology’s Neo4j, which took 3 months to build, was tested with dummy data for several thousand products, and there were absolutely no performance issues.

As for the search response time, results were sent back within seconds.

Read the whole thing.

neovis.js

We’ve mentioned this before when talking about William Lyon’s awesome blog posts, but neovis.js is a promising JavaScript library that focuses on visualizing Neo4j data. The project could probably use some assistance from motivated individuals.

Neo4j With Scala: Migrate Data From Other Database to Neo4j

Part of the Neo4j With Scala blog series, this post focuses on importing data info Neo4j from PostgreSQL, MySQL, Cassandra, and Oracle. All of this is accomplished with the awesome APOC plugin, which we are happy to install on any Graph Story Neo4j instances.

• • •

Are you doing something cool with graphs at Graph Story? Let us know! We want to talk about what our customers are doing here in the newsletter.

Until next week,

Ed Finkler
CTO, Graph Story

Graph Story Newsletter: Leftpad wouldn’t have happened if you used Neo4j

I’m in Seattle this week to talk graphs at PNWPHP, but nothing can keep me from you, or writing this newsletter. NOTHING.

Graph Database with Neo4j and a .NET Client


If you’re using .NET, Chris Skardon and Michael Hunger have written a great intro to Neo4j on the .NET platform. It’s a quick intro to Graphs and Neo4j, plus the basics of calling Cypher queries from C#.

Oh, you wish you had Neo4j on Azure for your .NET application? Graph Story can do that for you!

Neo4j + KeyLines: The Developer’s Route out of Dependency Hell


Hey, remember leftpad? Oops! That sucked, right? Miro Marchi from Cambridge Intelligence breaks down how he used Neo4j and the KeyLines Toolkit to analyze NPM package dependencies and find what would be affected by loss of a single package. He also does analysis of code licenses and compatibility issues. Cool stuff!


William Lyon’s blog is filled with awesome Neo4j articles, and this one is no exception. It steps you through creating a content recommendation system based on links posted to Twitter using Python. It has great, detailed code examples, taking you from retrieving the data on Twitter to scoring links for recommendation in Neo4j.

• • •

Are you doing something cool with graphs at Graph Story? Let us know! We want to talk about what our customers are doing here in the newsletter.

Until next week,

Ed Finkler
CTO, Graph Story

Graph Story Newsletter for Friday, Sept 9, 2016

A Song of Vertices and Edges

Big RAM Neo4j Instances on Google Compute Engine


Do you need a phat, RAM-heavy virtual machine for Neo4j? It’s worth checking out what Graph Story offers on Google Compute Engine. A 4-core instance with 26GB RAM costs less than an AWS 4-core instance with 15GB. More RAM, less money, seems legit. More and more of our customers are getting on GCE because of the pricing we can offer. Check out our pricing page for more info.

NDP Episode #9: Graph Databases with Neo4j


On the latest episode of the NoSQL Database Podcast, host Nic Raboy is joined by Neo4j developer relations person Ryan Boyd to talk about some key advantages that graph databases have, and why they’re so much better than alternatives in modeling relationships. There’s important info here for winning database arguments with your friends and colleagues.

Analyzing the Graph of Thrones


Man, isn’t Game of Thrones cool? Not the books because I don’t read, but the show with the blood and the nudity and the swords. That is awesome. You know what else is awesome? Graphs. Let’s combine them!

William Lyon did just that with his article “Analyzing the Graph of Thrones,”which breaks down analysis of the GoT social graph. It’s a great read, in particular because it does an excellent job of explaining several graph metrics like “betweenness centrality” and “pivotal nodes” and other smart words. Then the author pulls it into python-igraph, a port of an R graph analysis library, to do stuff like calculate page rank and community detection. The examples are straightforward and comprehensible, and there’s tons to learn from here.

Power a Github Notification Bot for Issue Reviewers with Graph Based NLP


Our boy Christophe Willemsen over at GraphAware wrote up a really interesting blog about building a bot that automatically chooses and notifies reviewers for GitHub pull requests. He uses the APOC procedures library and the GraphAware Natural Language Processing plugin for Neo4j, which is totally going to available real soon now. It’s a good breakdown on how you would load external data into Neo4j and use NLP to analyze and make conclusions about your dataset.

• • •

Are you doing something cool with graphs at Graph Story? Let us know! We want to talk about what our customers are doing here in the newsletter.

Until next week,

Ed Finkler
CTO, Graph Story

GraphStory helps AgSmarts enable the next agricultural revolution: The Agri Internet of Things

Over the last 200 years, the world has witnessed the agriculture industry make tremendous strides in producing more crops with less labor and overall better agriculture practices. The amount of labor needed to grow 100 bushels of corn has gone from 80 hours in 1850 to 2 hours today. In the same period, the yield for one acre has gone from 40 bushels to over 160 in good years.

Even with these advances in agriculture, farmers must continue to innovate to be successful. Farmers manage their fields while needing to understand dozens of variables, such as changes in weather patterns to the impact of seeding patterns. In addition, farmers in California – as well as other areas of the US – have also met with the harsh realities of drastic drought conditions. To that point, water scarcity is only one of the effects of a rapidly changing climate that is affecting farming.

High Tech in Agricultural Progress

AgSmarts, a two-year old startup based in Memphis, TN, is focused on helping farmers by providing unprecedented and specific crop information that, in turn, lowers their operational expenses and optimizes crop yields. AgSmarts’ Precision Ag includes moisture-sensing technology, predictive analytics, and farm equipment automation that represent an innovative revolution in data-driven agriculture.

Farmers are furnished with smart sensors that connect wirelessly to the cloud and collect & analyze data – from real-time weather data, crop and other variables in their fields. This data helps the crops tell the story about the impact of farm management descisions and changing conditions and, ultimately, can provide direction in order to produce greater yields.

agsmarts-img1

Big Data helps Agronomics

Wireless technologies and the internet of things have made it possible to create affordable field sensors that constantly measure critical data like temperature and soil moisture. Data collected by the sensors is uploaded to the cloud. AgSmart combines the sensor data with weather data, crop growth and many other data points and stores them in a graph database. Graph databases are optimized to quickly create insights from large datasets – turning data into actionable insights.

“Graph Story has been instrumental in getting our cloud application up and running in such a short time. Their hosted graph database gives us all the advantages of a scalable, lightning fast, modern big data solution without any of the hassles of maintaining it and worrying about scalability” says Clayton Plymill – CTO and co-founder of AgSmarts.

In partnering with AgSmarts, Graph Story has helped with the design and implementation of its groundbreaking application as well as hosting and support. Our service allows AgSmarts to focus on their customers and their core business as well as provides an affordable, scalable platform to help further their goals in providing amazing technology in agriculture.

To get started with the Graph Story platform, sign up for our free trial or contact us at contact us with any questions!

_______________________

About AgSmarts: Based in Memphis, TN, AgSmarts is a Precision Ag technology company that offers remote wireless sensing, predictive irrigation and crop management analytics, and equipment automation that collectively represent a revolution in data-driven agriculture. AgSmarts’ platform combines hardware and software solutions into a versatile, powerful and cost effective suite of tools that producers, researchers and agronomic consultants can use today in the struggle to conserve natural resources, control operational costs and maximize crop yields. For more information about AgSmarts, please visit www.agsmarts.com

 

Why use a Graph Database?

What do LinkedIn, Walmart and eBay as well as many academic and research projects have in common? They all depend upon graph databases as a core part of their technology stack.

Why have such a wide range of industries and fields found a common relationship through graph databases?

The short answer: graph databases offer superior and consistent speed when analyzing relationships in large datasets and offer a tremendously flexible data structure.

As many developers can attest, one of the most tedious pieces of applications dependent on relational databases is managing and maintaining the database schema.
While relational databases are often the right tool for the job, there are some limitations – particularly the time as well as the risk involved to make additions to or update the model – that have opened up
room to use alternatives or, at least, consider complimentary data storage solutions. Enter NoSQL!

When NoSQL databases, such as MongoDB and Cassandra, came along they brought with them a simpler way to model data as well as a high degree of flexibility – or even a schema-less approach – for the model.
While document and key-value databases remove many of the time and effort hurdles, they were mainly designed to handle simple data structures.

However, the most useful, interesting and insightful applications require complex data as well as allow for a deeper understanding of the connections and relationships between different data sets.

For example, Twitter’s graph database – FlockDB – more elegantly solves the complex problem of storing and querying billions of connections than their prior relational database solution. In addition to simplifying the structure of the connections, FlockDB also ensures extremely fast access to this complex data. Twitter is just one use case of many that demonstrate why graph databases have become a draw for many organizations that need to solve scaling issues for their data relationships.

Graph databases offer the blend of simplicity and speed all while permitting data relationships to maintain a first-class status.

While offering fast access to complex data at scale is a primary driver for graph database adoption, another reason they offer the same tremendous flexibility that is found in so many other NoSQL options. The schema-free nature of a graph database permits the data model to evolve without sacrificing any the speed of access or adding significant and costly overhead to development cycles.

With the intersection of graph database capabilities, the growth of graph database interest and the trend towards more connected, big data, graph databases increase the speed of applications as well as the overall development cycle – specifically how graph databases will grow as a leading alternative to relational databases.

Graph Kit for Ruby Part 3: Neo4J, Spree – Engine Yard deployment

Welcome to the third and final installment of the Graph Kit for Ruby post series.  Part 1 kicked the series off with a brief look at the idea of a graph database and some description of the Spree online store I planned to enhance with a graph. Part 2 went in depth with the addition of a graph-powered product recommendation system to a Spree online store.  In this final entry we’ll learn how to tweak our Spree + Neo4j store to deploy it to a production server on Engine Yard Cloud.

Provisioning

Engine Yard deployment of the Spree application worked in three major phases: provisioning the server, configuring the server, and pushing the code.  That runs for ten minutes or so, and then you have a new server running.  Next up – SSH into the server to do the last-mile config before your first deployment.

Oops!  My new server didn’t have my SSH keys and I couldn’t figure out an easy way to get them installed after the provisioning.  Since I was still in a happy prototyping mode I just deleted the server and then uploaded my SSH keys to my Engine Yard account under Tools -> SSH Public Keys -> Add a new SSH public key.  You’ll want to do the same if you’re following along at home.  If you don’t have a key yet, I recommend GitHub’s explanation on what SSH keys are and how to get one.  Once you’ve got your keys uploaded you can safely move on to the ‘boot a new server’ part of the Engine Yard setup.

engine-yard-panel

The Engine Yard cloud servers look to be hosted somewhere on Amazon Web Services.  Once I got my keys sorted out I created a new application in the control panel and named it graphkit_ruby. I chose some pretty standard Rails app defaults – the latest available version of Ruby, the latest available version of Postgres, and Passenger as the web server.  Engine Yard does offer SSL for real production stores but I didn’t bother since I’m not planning to sell these fake pet products.

Configuration

Using environment variables for configuration on Engine Yard

Engine Yard’s provided us with an app server and an RDBMS which covers the basics of Spree.  To get our new graph-powered recommendation engine running we’ll also need access to a production graph database.  I signed up for a free trial database from the Graph Story front page.  To integrate our external Graph Story Neo4j database with Engine Yard we’ve got to have a nice safe way to store our database credentials and pass them to the Rails app at boot time.  I’ve gotten in the habit of using environment variables to configure my production applications so I can keep such secrets out of the codebase.  Newer versions of Rails support this practice with the addition of a secrets.yml file, but in this case I found it easiest to just use my own custom.env file with the dotenv gem.

To do the same for your app, add the dotenv gem to your Gemfile and then you’ll be able to read environment variables from a text file at run time. This wound up working well with Engine Yard – I just put the file in a shared config folder that is consistently available to the app from one deploy to the next.

We’ll force Rails to load our environment variables from config/env.custom at boot time by setting up a config/boot.rb file that preloads our variables:

boot.rb

I .gitignored this file full of configuration and secrets so the one I’m using locally won’t be automatically pushed to Github or to Engine Yard. We’ll push it up to our EY server with scp:

scp config/env.custom my-ey-server-name:/data/my-app-name/shared/config/

Note that I was able to omit the full EY login string because I have my EY server hostname and credentials set up locally in ~/.ssh/config. If you don’t do that you’ll have to spell out the connection info like scp filename deploy@ey-server-ip-address:destination-folder/ instead.  That shared config directory is automatically symlinked into the config subdirectory of each new deployment to EY.

unnamed

To integrate this custom environment setup with Rails I went ahead and created a custom Neo4j initializer file for Rails that teases apart a database URL-style configuration into the sort of thing that the Neo4j gem is actually looking for.  This means that I can punch in a NEO4J_URI variable of the form https://username:password@ autogenerated-hostname.do-stories.graphstory.com:portnumber and Rails will automatically connect to my remote database.  With a fallback of localhost:7474 we can seamlessly switch between local Neo4j in dev mode and our actual Graph Story hosted database in production.  Speaking of which, you’ll want a free hosted Neo4j database of your own.  You can of course sign up from the Graph Story home page. Here’s what the connection info looks like from within my Graph Story admin panel – I copied the server connection information from here into my custom.env and formatted it into a NEO4J_URI string that I configured Rails to recognize via my Neo4j initializer file.

graph-story-panel

Creating a production secret token to sign cookies

Rails 4.1 uses a secrets.yml file that is .gitignored much like our above env.custom to hold production secrets. I have never messed with those myself but I did notice that it was looking for ENV["SECRET_KEY_BASE"] to set a production secret token for signing sessions. Let’s go ahead and generate one of those and tack it on to the production secret file we already created and then we’re (almost) in business.

secret-key

Deploying the code and seeding the database

Setting the production secret was the last step in getting my EY environment to play nice with Spree!  From there I clicked the “deploy HEAD” button in my EY panel and it pulled up the latest code from the Graph Kit Ruby repository on GitHub.  Once the code was finally deployed and the app was running I went into my Spree console and ran my database seeds to get an admin user created and to gin up all of those pretend products for our sample data.  That’s RAILS_ENV=production rake db:seed from within your app’s deployment directory on the server. Mine was /data/graphkit_ruby/current as shown in the secret key screenshot above.

finished-store

Next Steps for a Real-World Project

Asynchronous Data Processing

For a high performance production application you wouldn’t really want end users to wait for the round trip between Engine Yard and Graph Story every time we log a new purchase event to the graph. It’d be much smoother to use a background job to send that data over. I’d use Sidekiq if this were a client project – It’s a great Ruby library for background job processing and it comes with a nice job status visualizer.  By offloading offloading the graph writes to a background job you allow the web app to respond that much faster.  It’s common to do the same with transactional emails and any post-order processing in a high volume Spree site.

Richer Recommendations

Once you get started down the road of tracking purchase events you quickly realize there’s lots of other data you can start tracking to use for better recommendations. Here’s a few ideas: “Users who looked at this also looked at that”, “users in your area also purchased this”, “users who bought this often by that in the same order”.  You can also look at copying more of your product and user metadata over to your graph nodes in order to query on product characteristics or user demographics.

Now that you’ve seen how straightforward it is to model nodes and relationships with Neo4j you can imagine how you might start layering your own user location data or per-cart data into your graph for richer recommendations.  I hope you’ve had as much fun reading this series as I did writing it!

Graph Kit for Ruby Part 2: Bootstrapping a Spree Store with Neo4j Integration

Welcome to the second installment of the Graph Kit for Ruby series which covers setting up a Ruby project and Neo4J integration.  In the first post I described the plan for the project — to showcase the ease of use and business value of graph databases in the context of a Ruby project.  Today we’re digging into the implementation of a Neo4j-backed product recommendation engine.  This recommendation engine will sit atop an online store built with the Rails-based e-commerce project Spree.

Brief note on prerequisites and recommended experience level

This post assumes that the reader has a basic understanding of the Ruby ecosystem.  You’ll need a newer version of Ruby (2.0+ preferred) and the bundler utility.  You might also need to be able to install some packages on your system like the PostgreSQL and Neo4j database engines.  I used Ubuntu to prototype this but you should be able to get by fine with any recent version of Linux or OSX.  The finished project is published on Graph Story’s GitHub account as the Graph Kit for Ruby, so feel free to dig in for specific implementation details.  If at any point you get stuck feel free to reach out to the Graph Story team for help.  Here we go!

Starting a new Spree Project

You’ll want to follow the recommendations in the official Spree Getting Started Guide to get your workspace set up.   Note that you’ll start with the usual rails new installer before getting into Spree-specific setup.  Here are the basic shell commands to get your Rails project started with Spree loaded into its Gemfile:

# get some gems
gem install rails -v 4.1.6
gem install bundler
gem install spree_cmd

# make a new rails project
rails _4.1.6_ new graph-kit-ruby
cd graph-kit-ruby

# install spree into the gemfile and run its generators
spree install --auto-accept

Additional Gems

Look through the completed project’s Gemfile on GitHub and you’ll see several other important gems:

  • neo4j. This Object-Graph Mapper (OGM) gem aims to provide full-featured Neo4j access with an ActiveRecord feel. Rails devs with minimal Neo4j experience should appreciate the familiarity.
  • pg — You’ll need an RDBMS for Spree, and I prefer PostgreSQL.
  • dotenv — I’m going to deploy this to Engine Yard, and dotenv came in very handy. More on that in post 3.
  • jazz_hands — This one provides pry and several other convenience tools for working at the Ruby console. I find Pry in particular to be very helpful when learning a new library such as neo4j.

Building a Recommendation Engine with Neo4j

My idea here was to model Users, Products, and Purchases in a graph so that we could easily identify purchasing patterns to use in our recommendations. Spree already has a Spree::User and a Spree::Product model. Purchases are modeled in the RDBMS as Spree::LineItem rows that associate a product with a specific order. I’ll create a Recs module inside my project and give it User and Product models that are linked together by purchase histories.  The Product model is where most of the graph action happens, so let’s focus there.

Designing a “product” node type using the neo4j OGM

recs-product-rubymine

Spree gives us a products table in Postgres out of the box, so what we want to do is set up a Product node in Neo4j for each product row in Postgres.  With a bit of work we can ensure that each new Postgres product automatically creates a matching graphed product node.  I created a :graphed method on the Spree::Product model that finds or creates a matching Product in Neo4j on demand by calling into the self.from_spree_product(spree_product) method in the screenshot above.

Important things to notice in the Recs::Product model

  • Each node has a slug property – this is the unique identifier Spree uses in product URLs and in its relational database to distinguish one product from another.  Adding that to the companion node ensures I can link products from one database to the other and back.
  • The model class gets most of its functionality by including Neo4j::ActiveNode.  This gives us ActiveRecord-like semantics for finding, creating, updating, and removing nodes.
  • It has_many purchases as connections to User nodes.  This is where the edges in our graph come from and it’s how we’ll make meaningful queries against our graph.

Breaking down a Cypher query generated with the neo4j gem

The most interesting thing about this product model is of course the Cypher query (Cypher is Neo4j’s declarative query language) that allows us to surface related purchase data.  Let’s break it down line by line:

This query is executing in the context of an already selected Product node. We’ll refer to this starting point in the query itself as :product.

query_as(:product).

Identify all products which have been bought by users who have also bought the :product this query is built on.

  match("product<--(user:`Recs::User`)-->(other_product:`Recs::Product`)").

Discard products in the result set that are equivalent to the initial :product.

  match("product<--(user:`Recs::User`)-->(other_product:`Recs::Product`)").

Limit our results to a few products so that Neo4j and Spree can spit out results faster.

  limit(limit).

Return an array of unique products that match the other_product node in our match statement. This means any products bought by people who bought :product should be a valid result.

  pluck('DISTINCT other_product')

If you squint at that query you can see a sort of (Product)<–(User)–>(Other Product) relationship going on in the match statement.  “Queries that look like whiteboarded graphs” seems to be a design goal of the Cypher query language used in Neo4j. As a new user I can say it is pretty easy to get started with.

Automatically logging new purchase events as connections in our graph

Once we implement our Recs::User and Recs::Product model with their ‘purchase’ connection type, all we need to do is automate the logging of purchase events from Spree and PostgreSQL over to our Neo4j database.  Here’s how to do that:

def log_to_graph
  return unless user.try(:graphed)
  user.graphed.purchases << product.graphed
end

Let’s make some fake yet interesting purchase history data

In order to demonstrate Neo4j’s ability to easily unearth interesting connections in our data set, I decided to create some pretend customers with very consistent purchasing habits. For instance, a Mr. Green might only buy green products, while a Ms. Pillow might buy any pillow in the store regardless of its color. You can see the methodology used to generate this sample data in the graph-kit-ruby repository on GitHub.

fake-purchases

Inspecting the data in the Neo4j browser with the Cypher query language

One of my a-ha moments when getting into Neo4j was discovering the built-in web server and its visualization tools. After you’ve stuffed some nodes and relationships into your dev database you can visualize them by poking around at http://localhost:7474.  You can click on the “purchases” relationship to see a visualization of the entire product purchase history graph.  I wanted to dig in a bit deeper though so I got my Recs::Product model to give me some Cypher help.  You can learn more about Cypher on Neo4j’s site.

ruby-to-cypher

Using only the neo4j gem’s built-in methods and the Cypher syntax we covered above I’ve isolated a single Product node and gotten a good lead on how to look it up by hand using my own Cypher query.  Note the :to_cypher method in the screenshot above which generates a working query from your Ruby code just like :to_sql in ActiveRecord.  Unfortunately for me pasting that directly into the Neo4j browser didn’t quite work, but it got me close enough.  I tweaked the WHERE clause to look for product.slug = 'red-shirt' rather than the parameterized product_id query :to_cypher gave me and then I added RETURN product, user, other_product to the end.  Once I’d fixed up the Cypher I was able to get a neat visualization of the red shirt, the one user in my test data who’d bought it, and all the other things that user purchased.  Shirts, all shirts!

red-shirt-graph-in-browser

Integrating our product recommendations with the Spree storefront

Now that we’ve generated our sample data and figured out how to query Neo4j for simple product recommendations, let’s add them to our storefront and call it a day.  I wired up the product recommendations directly into the Spree::Product model as :users_also_bought.  That delegates to the :users_also_bought method from the relevant Recs::Product node and returns the first three results.  Armed with that easy lookup I dropped a new section into the product detail view with a _users_also_bought.html.erb partial template:

<% if (products = product.users_also_bought).any? %>
  <div class='users-also-bought'>
    <h3>Users also bought:</h3>
    <%=render partial: 'spree/shared/products', locals: { products: products } %>
  </div>
<% end %>

My favorite thing about this partial is that it managed to leverage a built-in Spree products partial, and all I have to do is pass it a local variable named products to which I’d assigned the results of product.users_also_bought.  There’s really nothing going on here other than looking up the data and passing it along to the built-in.

product-recommendations-in-place

Final post: Deploying our Ruby graph kit to Engine Yard and Graph Story

For the third and final post in this series we’ll cover the sysadmin work required to deploy your working graph-enhanced Spree site to production.  We’ve chosen to deploy to Engine Yard Cloud, so most of the post will focus on configuration specific to their environments.  You’ll also see how to switch from a local Neo4j server in development to a production-ready Graph Story server by layering in the appropriate connection strings.

Ruby Neo4j Part 1: Let’s Build a Recommendation Engine for a Spree Store

Hi there! My name is Daniel and I’m a consulting Ruby developer here in Memphis, Tennessee. The past year or two I’ve mostly focused on building online stores.  At Graph Story’s invitation I recently set out to build a Ruby project demonstrating a simple yet valuable integration with a graph database.  The little I knew about graphs before this project was all theoretical so I decided to start by learning more about Neo4j, the open source graph database technology behind Graph Story’s service offerings.  Poking around online to find graph database resources I wound up at Neo4j’s homepage and found that some of its founders published a book with O’Reilly. They were giving away copies in exchange for email addresses. Sold! The book mentioned several good use cases for graph databases, but recommendation engines jumped off the page for me.

You’ve undoubtedly seen automated recommendation tools in action around the web.  The first one that came to mind for me is the one at the bottom of every product page on Amazon saying “Customers Who Bought This Item Also Bought…“.  Turns out we can make our own fairly easily with a graph database!

other-recommendation-suggestions

What will our graph kit starter project do?

We’re going to have an online store with Ruby & Neo4J.  It’s going to sell pet products in various colors and product families.  You might see a pink leash, a yellow collar, or a red pillow.  We’ll whip up some artificial customers with interesting buying patterns and have them purchase a few hundred things from our online pet supply store.  Once that’s done, we’ll be able to dig in to the graph our purchase activities have generated and uncover some useful data about customer purchasing preferences.  Armed with this new data, we’ll set up an Amazon-style “customers who bought pink pillows also bought pink leashes” pane on every product page of our store in hopes of increasing sales with relevant recommendations.

pink-shirt-graph

You can see in the image above how a pink shirt appealed to one customer who collects pink things but it also appealed to another customer who collects shirts.  Digging these connections out of our graph will make it easier for us to present customers with targeted recommendations of things they might actually want to buy.

Where can I get the code?

All code for this project is shared on Graph Story’s GitHub account: The Graph Kit for Ruby is a mostly stock Spree store with some simple Neo4j integrations that will give you a head start on building your own high-value graph solutions.  If you read the source code closely you can probably figure out where we’re going in posts 2 and 3 in this series 😉

How will we deploy it?

We’re going to use Engine Yard to run our app server and a PostgreSQL database to store the basic product catalog and order history.  New products and new orders are also going to be streamed directly to a Neo4j database so that we can build our recommendations in real time.  We’ll take a fairly naive approach to this streaming in the interests of keeping this series of posts easily digestible.

Up next: building the sample store and integrating it with Neo4j

Stay tuned for post 2 in this series: “Bootstrapping a Spree Store and Integrating With Neo4j” where I’ll walk through the steps required to create your own Spree store, load up those fake customers, products, and purchases.  Once that’s in place we’ll layer in our “customers also bought” widget on the Spree frontend.  For post 3 we’ll walk through the process of deploying the working Spree app to an Engine Yard cloud server that integrates with a free trial graph database provided by Graph Story.

Page 1 of 2

Powered by WordPress & Theme by Anders Norén