GraphStory helps AgSmarts enable the next agricultural revolution: The Agri Internet of Things

Over the last 200 years, the world has witnessed the agriculture industry make tremendous strides in producing more crops with less labor and overall better agriculture practices. The amount of labor needed to grow 100 bushels of corn has gone from 80 hours in 1850 to 2 hours today. In the same period, the yield for one acre has gone from 40 bushels to over 160 in good years.

Even with these advances in agriculture, farmers must continue to innovate to be successful. Farmers manage their fields while needing to understand dozens of variables, such as changes in weather patterns to the impact of seeding patterns. In addition, farmers in California – as well as other areas of the US – have also met with the harsh realities of drastic drought conditions. To that point, water scarcity is only one of the effects of a rapidly changing climate that is affecting farming.

High Tech in Agricultural Progress

AgSmarts, a two-year old startup based in Memphis, TN, is focused on helping farmers by providing unprecedented and specific crop information that, in turn, lowers their operational expenses and optimizes crop yields. AgSmarts’ Precision Ag includes moisture-sensing technology, predictive analytics, and farm equipment automation that represent an innovative revolution in data-driven agriculture.

Farmers are furnished with smart sensors that connect wirelessly to the cloud and collect & analyze data – from real-time weather data, crop and other variables in their fields. This data helps the crops tell the story about the impact of farm management descisions and changing conditions and, ultimately, can provide direction in order to produce greater yields.

agsmarts-img1

Big Data helps Agronomics

Wireless technologies and the internet of things have made it possible to create affordable field sensors that constantly measure critical data like temperature and soil moisture. Data collected by the sensors is uploaded to the cloud. AgSmart combines the sensor data with weather data, crop growth and many other data points and stores them in a graph database. Graph databases are optimized to quickly create insights from large datasets – turning data into actionable insights.

“Graph Story has been instrumental in getting our cloud application up and running in such a short time. Their hosted graph database gives us all the advantages of a scalable, lightning fast, modern big data solution without any of the hassles of maintaining it and worrying about scalability” says Clayton Plymill – CTO and co-founder of AgSmarts.

In partnering with AgSmarts, Graph Story has helped with the design and implementation of its groundbreaking application as well as hosting and support. Our service allows AgSmarts to focus on their customers and their core business as well as provides an affordable, scalable platform to help further their goals in providing amazing technology in agriculture.

To get started with the Graph Story platform, sign up for our free trial or contact us at contact us with any questions!

_______________________

About AgSmarts: Based in Memphis, TN, AgSmarts is a Precision Ag technology company that offers remote wireless sensing, predictive irrigation and crop management analytics, and equipment automation that collectively represent a revolution in data-driven agriculture. AgSmarts’ platform combines hardware and software solutions into a versatile, powerful and cost effective suite of tools that producers, researchers and agronomic consultants can use today in the struggle to conserve natural resources, control operational costs and maximize crop yields. For more information about AgSmarts, please visit www.agsmarts.com

 

Uncovering Open Source Community Stories with Neo4j

Every dataset has a story to tell — we just need the right tools to find it.

At Graph Story, we believe that graph databases are one of the best tools for finding the story in your data. Because we are also active members of several open source communities, we wanted to find interesting stories about those communities. So, we decided to look at package ecosystems used by developers.
survey-search
The first one we tackled was Packagist, the community package repository for PHP. Nearly 20,000 maintainers have submitted over 60,000 packages to Packagist, which gives us a lot of interesting data to investigate.

How We Used Neo4j to Graph the Packagist Data

Collecting this data and getting it into Neo4j was relatively straightforward.

One HTTP endpoint on the Packagist site returns a JSON array of all the package names. We iterated over that, and made individual calls to another endpoint to retrieve a JSON hash for each package, which includes both base package data and information on each version of the package, including what packages a given version requires.

The data model for our initial version was pretty straightforward. We have three node labels:

  • Package
  • Maintainer
  • Version

and five relationship types:

  • HAS_VERSION
    (Package)-[:HAS_VERSION]->(Version)
  • MAINTAINED_BY
    (Package)-[:MAINTAINED_BY]->(Maintainer)
  • REQUIRES
    (Version)-[:REQUIRES]->(Package)
  • REQUIRES_DEV
    (Version)-[:REQUIRES_DEV]->(Package)
  • SUGGESTS
    (Version)-[:SUGGESTS]->(Package)

This certainly isn’t a complete schema to represent everything within the Packagist ecosystem, but it let us do some interesting analyses:

  1. What packages get required the most by other packages?
  2. What maintainers have the most packages?
  3. What maintainers have the most requires of their packages?
  4. What maintainers work together the most (packages can have multiple maintainers)?
  5. What are the shortest paths between two given packages, or two given maintainers

Our Findings

You can see our results so far at packagist.graphstory.com.

Some of what we found was expected: certain well-known open source component libraries get required the most, like doctrine/orm and illuminate/support.

It gets more interesting when examining maintainers, though. Some are high profile folks in the PHP community, like fabpot and taylorotwell, but some are people with whom we weren’t as familiar. It certainly made us re-examine what we thought we knew about the PHP community – it’s not always folks who are speaking at conferences that are making big contributions.

The shortest path analyses were interesting as well. There were a few packages that showed up in these paths over and over to tie together maintainers and packages, such as psr/log. “Keystone packages” might be a good term for these, because they seem to join and support the PHP open source community again and again.

A Cypher Example: Finding Top Maintainers by Packages

Here’s one example Cypher query we ran to find the top Packagist maintainers by package count:

MATCH (m1:Maintainer)<-[:MAINTAINED_BY]-(Package)
WITH m1,COUNT(*) AS count
WHERE count > 1
WITH m1,count
ORDER BY count DESC
RETURN m1.name as name, count
LIMIT { limit }

See the results of this query and others on packagist.graphstory.com.

Why We Used a Graph Database

Much of what we’ve done would be possible with an RDBMS or a document database, so why do it in a graph database – specifically Neo4j?

We found three major upsides while working on this project:

  1. It is so much easier to map out data and relationships. Making relationships in RDBMSes work, even in simple cases, is harder, and significantly more difficult to change down the road. Compared to popular document databases, Neo4j relationships are done in the database — we don’t have to maintain them with application logic.
  2. Discovering how people and packages are connected is much easier and faster than with RDBMSes and popular document databases. Cypher and the graph model makes it easy to get the data we want without complex SQL joins or a wrapper script in another language.
  3. Trying new queries to explore the data is so convenient with Neo4j’s web interface. It’s quick and easy to prototype and profile from there, and then copy and paste the Cypher into your app.

We’re obviously big believers in graph databases at Graph Story, but this is a fun project that highlights a lot of the advantages of Neo4j. We found a number of interesting stories in Packagist, and there are certainly more to uncover.

Moving Beyond the Social Graph: Neo4j Graph Database examples for ad-hoc analysis

If you’ve done any reading on graph databases, then you have likely come across the example of how graph databases can illustrate connections in a social media context. While graphs are excellent at handling social connections, this model just scratches the surface on the variety of use cases that can be addressed within a graph database.

Built-In Ad Hoc Analysis

Graph databases are an excellent match for experimental – or ad hoc – analysis. While there are many features in Neo4j that enable this type of predictive analysis, we will show two built-in features in the Neo4j browser to perform an investigative analysis, including data visualizations and implementing and removing relationships in a graph schema.

Seeing is Believing

When it comes to understanding data, few tools offer the advantages of data visualizations like Neo4j. While the Neo4j browser does offer a tabular view of the data, the visualization features create a more intuitive data representation for data analysts and will reduce the critical time-to-insight factor of data analysis. Consider the following examples of how data can be illustrated with the out-of-the-box utilities in Neo4j.

During a recent graph implementation to help law enforcement analysts, we can see how theft incidents were related by location and time. In a few simple steps, we can illustrate these incidents with visualizations that enabled analysts to summarize the events and make changes to patrol configurations. Here are a few Neo4j graph database examples

usecase1

From the illustration above, we can see how analysts were able to quickly see the difference in theft activity between days (2 on 4/20 & 6 on 4/7 ). Using the same visualization, we changed the presented label of the data to expose the value of the theft activity.

usecase2

From this illustration analysts were quickly able to see that $290 worth of bikes were stolen on 4/20/2014. When we added a relationship between events to not only look at thefts on the same day, but also the same hour, the following visualization showed analysts more detailed information about when these events occurred.

usecase3

From this illustration analysts were quickly able to see that on 11/1/2010 at 2:00 pm, 3 theft events occurred. Finally, consider the following example where a relationship was added to the data to match events which occurred at the same latitude and longitude.

usecase4

The illustration here shows theft events totaling $320 at LNG: -75.059492750000004 and LAT: 40.047509900000001. This data eventually correlated with a geo location database and an exact location was determined to be of high risk and a candidate for increased patrols.

Experimental Connectedness

As demonstrated above, adding relationships like time and space to data is a powerful tool to experimental analysis for law enforcement. Being able to correlate when and where events take place allows these agencies to be proactive rather than reactive to theft events. So let’s take a look at how Neo4j enables this on-the-fly analysis.

With just 3 lines of code, we can add the relationship between theft events based on the theft date property of the theft node.

Create Theft Relationship Based on Date

MATCH (t:Theft), (u:Theft)
where t.THEFT_DATE = u.THEFT_DATE
create UNIQUE (t)-[:STOLEN_ON_SAME_DAY]->(u)

In this example, we first query the graph to find all theft events have the same date and then create a relationship labeled STOLEN_ON_SAME_DAY to the data. In seconds this operation made it possible to correlate these events and produce the visualization provided above.

Create Theft Relationship Based on Date and Hour

MATCH (t:Theft), (u:Theft)
where t.THEFT_DATE = u.THEFT_DATE AND t.THEFT_HOUR = u.THEFT_HOUR
create UNIQUE (t)-[:STOLEN_ON_SAME_DAY_AND_HOUR]->(u)

Just by adding an additional AND condition to the statement for stolen on the same day, we were able to drill deeper into the relationship between theft events and uncover events which occurred during the same hour (represented by the :STOLEN_ON_SAME_DAY_AND_HOUR relationship).

Finally let’s look at how we setup the relationship which correlated theft events which took place at the same latitude and longitude.

Create Theft Relationship Based on Geo-Location

MATCH (t:Theft), (u:Theft)
where t.LAT = u.LAT AND t.LNG = u.LNG
create UNIQUE (t)-[:STOLEN_AT_SAME_LOCATION]->(u)

Again with a simple where statement, we were able to link events by geo-location and represent the relationship STOLEN_AT_SAME_LOCATION.

Looking Ahead

From these cases, it should be clear that graphs can offer two very easy ways to analyze more than just social media data. After all, being able to relate data points by spatial and temporal elements is applicable to more areas than just law enforcement. From a retail perspective, this can lead to illustrations of product demand by store location and time of year. Apply the same relationships to customer purchasing, and you can gain insight into when and where customers are buying what products.

When you factor in the relative ease and speed at which this type of analysis can be performed, using graphs to perform analysis is a low risk/high reward exercise.
While social media projects get much of the limelight in graphs, the real power of graphs is being able to move beyond relational database constraints and putting the analytical strength of Neo4j to use.

with use case work and code provided by William Sharp