Case 5: Business Intelligence
The above two charts show several terms relating to business intelligence (BI) co-occurring with each other. The vertical axis shows frequency of co-occurrence and the horizontal shows years from 2000 to 2009. The left chart shows the concepts co-occurring with organizations. In 2006 there is a large spike for several pairs, highlighted in green. These correspond to data mining combined with the White House, NSA, AT&T, ACLU, EFF, and the like from the warrantless wiretapping scandal that year.
The chart on the right shows organizations co-occurring with other organizations during the same time period. Just like the left chart, in 2006 there is a large spike for several pairs highlighted in green, corresponding to the pairwise combinations of NSA, White House, AT&T, FBI, EFF, ACLU, CIA, and so on.
To show the overall relationships between BI terms, we then plotted the co-occurrence frequencies of each pair of terms in a matrix graphs. Each term is placed on both the horizontal and vertical axes, and the intersection of a row and column are the co-occurrence frequency of that pair. The following graph on the top left shows the whole view of the BI terms, with color representing the degree of co-occurrence (pink is high, orange is medium, and white is low). The rows and columns are both clustered by their similarity, so terms with similar relationships are placed near each other.
As you can see, the densest, darkest, most interesting part is the on the bottom right. In this subset we can identify several clusters. The black box on top right graph shows the large warrantless wiretapping cluster we saw before of data mining, NSA, CIA, FBI, White House, Pentagon, DOD, DHS, AT&T, ACLU, EFF and Senate Judiciary Committee. The bottom left graph shows three clusters: two tech clusters and one finance one. The bottom-right black tech cluster are the heavy hitters of IBM and their Cognos product, Microsoft, and Oracle while the top-left green cluster is Google, Apple, and Stanford. They straddle the finance cluster containing NASDAQ, NYSE, SEC, NCR, and MicroStrategy. The bottom-right chart shows another interesting cluster relating to Air Force, Army, Navy, GSA and (more loosely connected) UMD.
We can see these same relationships and clusters in a network visualization of the relationships below. The node for each entity is sized by the number of other terms it co-occurs with and colored by automatically-determined clusters from the Clauset-Newman-Moore algorithm. Edges between terms are sized, colored, and transparent based on how frequently those entities co-occur.
After filtering out any pairs that co-occur less than 10 times, we can see a few key insights by looking at the visualization of a sub network. Data Mining, Hyperion, Knowledge Management, and Microsoft are some of the most frequently co-occurring concepts and entities. The blue product cluster in the center shows that these entities co-occur frequently with finance terms. Hyperion is densely connected with different types of nodes. The orange group shows terms related to warrantless wiretapping, while the small lime colored group is the TSA and Electronic Privacy Information Center.