Two types of raw data were collected from multiple sources: ACM digital library, IEEE Xplore, Proquest and LexisNexis. Articles numbers were automatically retrieved from ACM digital library, IEEE Xplore and Proquest; while Full textual paper were download manually from LexisNexis. The collecting process is following five steps: Step1: Identify the new concepts. e.g. "tree map", "cloud computing"
Step2: Query formulation and expansion. e.g. "tree map" or "tree maps" or "treemap" or "treemaps" or "tree-map" or "tree-maps"
Step3: Understand the search system
Step4: Create the script
Understand and parse the URL-> generate new URL->parse the web page->output the result
Step5: Use the script to collect article number (trend data), or manually download the full textual paper.
This study explored Natural Language Processing and Social Computing approaches to automatically extract information from data collected in the first phase. Two types of information are being extracted:
Two types of Natural Language Processing methods are used to identify entities and relationship:
Two types of social Computing systems are used to gather new entities and relationships, and validate automatically extracted information.
