Foreseeing Scientific and Economic Impacts Through Emerging Topics

by Nicolas Sacchetti

Ali Ghaemmaghami, a PhD candidate in Information Systems Engineering at Concordia University, advocates for the detection of emerging topics in the AI scientific ecosystem to enable the prediction of their scientific and economic impacts.

At the P4IE Conference on Measuring Metrics that Matter, on May 10, 2022, he presented his paper: Detecting Emerging Topics in AI Scientific Ecosystem.

Investors and funding agencies can use the identification of emerging topics to evaluate different datasets or domains, assessing their potential for emergence. Ghaemmaghami identifies novelty, growth, coherence, impact, and uncertainty as the principal attributes of emerging topics, drawing on the framework by Rotolo et al. (2015).

Traditional detection methods range from soliciting expert opinions and conducting R&D analysis, to investigating technical papers and monitoring patent registrations. To measure the pertinent metrics, Ghaemmaghami emphasizes the importance of tracking the volume of AI-related papers and the occurrence of specific keywords.

The research team, comprising Ali Ghaemmaghami, Andrea Schiffauerova, and Ashkan Ebadi, suggests utilizing digital technologies for text analysis in scientific journals. They employ the Rake Algorithm — Rapid Automatic Keyword Extraction — within the field of Natural Language Processing (NLP) to examine titles and abstracts of scientific articles, which serves to detect emerging topics in the AI scientific ecosystem. They also assign an emergence score to new topics published in scientific journals.

The emergence score is assigned to each topic to indicate its potential for emergence, calculated based on attributes such as novelty, persistence, growth, and community engagement.

« We have the data. We extract abstracts and titles from it, analyze them to define and identify emerging topics, and finally, we determine the emergence score for these topics, » explains Ghaemmaghami.

Normalization of Datasets

The research group has also derived a mathematical formula N_EScore = (40 * EScore) / √N to normalize datasets, yielding similar results for larger and smaller samples. This facilitates comparisons across different domains using smaller datasets.

Their findings indicate that the emergence scores are relatively consistent with the normalized scores, confirming the method’s ability to mitigate the effect of dataset size. They also observed similar trends, albeit with some discrepancies, between two different datasets within the same domain and timeframe. The normalization process enabled the detection of consistent trends using only a subset of the data compared to the entire dataset.

The application of the Rake Algorithm in NLP has successfully captured emerging topics in the AI scientific ecosystem. The normalization applied through the mathematical formula to smaller datasets has demonstrated effectiveness, yielding results comparable to much larger datasets.

Using diverse datasets revealed consistent trends, and combining them promises even deeper insights.

Thanks to the normalization formula, the measurement of the emergence score can proceed unaffected by dataset size and data source variability.

Looking ahead, this methodology could be adapted for domains beyond the AI science ecosystem.

« Implementing this indicator of emergence could uncover funding and investment opportunities in specific research topics and technologies. Instead of focusing on singular topics like machine learning or big data, this method allows us to identify clusters of terms along with their emergence scores, » states Ghaemmaghami.

Ce contenu a été mis à jour le 2023-11-06 à 14 h 43 min.