Bibliomining and Comparison of Q4 and ESCI WoS Indexed journals under “Statistics and Probability” Category

The field of ‘Statistics and Probability’ has expanded its scope over the last few decades and have become an integral part of many fields with continuously increasing demand. This manuscript aimed for at a bibliometric analysis and comparison of all published documents during 2015 – 2019, from journals in the study topic category of ‘Statistics and Probability’ for Q4 Impact Factor (IF) journals and Emerging Source Citation Index (ESCI) of Web of Science (WoS). Sources with incomplete data for study timeframe were excluded and 31 sources from Q4 IF and 32 from ESCI journals were selected yielding 12808 and 4294 documents respectively. After data extraction from WoS, the bibliometric analysis at; source, author and document levels, were performed using “Bibliometrix” R-package. Q4-IF sources produced around 3 times more documents than ESCI sources. Articles were the main document type for both categories. China and USA were leading countries for Q4-IF while India, USA and Korea were dominant among ESCI documents. Two authors, namely, ‘Cordeiro GM’ and ‘Alizadeh M’ were among the 10 most productive authors in both categories. Sources “Communications in Statistics-Theory and Methods” and “Korean Journal of Applied Statistics” were leading contributors for Q4-IF and ESCI category respectively. For both categories, mainly similar trends were observed for keywords and topic coverage. In both Q4-IF and ESCI journals ‘Maximum likelihood’ and ‘Ordered statistics’ were observed to be most predominant keywords. A consistent publication trend with few similarities was observed in terms of documents production over the years for these two categories.


Introduction
Statistics and Probability has become an integral subject that offer approaches to deal with structure and give insights through data. Additionally, big data is establishing new challenges to researchers and mainly statisticians (Secchi, 2018). Furthermore, recent computational methods advancement has grasped wide attention from researchers and readers of several disciplines toward this subject. This field has expanded its scope over the last few decades and have become an integral part of many fields (Donoho, 2017;Drummond & Tom, 2011). This attribute has increasing demand and contribution for major areas of research such as Arts & Humanities, Life Sciences & Biomedicine, Physical Sciences, Social Sciences and Technology. Notably, despite known significance, the number of journals in this specific subject are assumed to be far less as compared to number of journals in other subjects. Nevertheless, there is limited literature that has research productivity and evolution of the 'Statistics and Probability' as a subject (Butt, Forthcoming 2021). However, some positive wave has been observed on the issue over the last few years.
Pichlhöfer, & Maier, 2010; Ronda-Pupo, Díaz-Contreras, Ronda-Velázquez, & Ronda-Pupo, 2015). It encompasses search across salient search databases, disciplines and document types along with more than one billion searchable cited references (WoS). Although, for any journal it is not common to be included in one subject category as usually most journals show overlap in terms of their coverage context yet WoS has made certain specific subject categories and subsequently each published document inherits all subject categories given to the parent journal. 'Statistics and Probability' is one of such category in WoS with around 124 journals categorized in four quartiles (Q1 to Q4) according to their impact (MJL-WoS). Additionally, in late 2015, WoS launched 'Emerging Sources Citation Index' (ESCI), with more than 7000 journals covering scientific, social science, and humanities literature (ESCI-WoS, 2015). Journals indexed in the ESCI do not obtain Impact Factors. Though, Journal Citation Reports (JCR) citation counts includes ESCI citations and consequently contributing to other journals Impact Factors. Moreover, in the continuously growing, dynamic and diverse literature, ESCI provides WoS users with extended possibilities to explore emerging research areas.
Terms, 'Bibliomining' and 'Bibliometrics' are interchangeably used and provide a gateway to evaluate such proceedings and fill the knowledge gap (Abramo & D'Angelo, 2011;Shieh, 2010). In the field of statistics, various bibliometric studies have explored different aspects such as; citation patterns in the journals of statistics and probability (Stigler, 1994), communications between statistical methodology and applied statistics (Eto, 2000), mostcited statistical papers (Ryan & Woodall, 2005)

Methodology
All indexed sources from the WoS 'Statistics and Probability' classification under Q4 Impact Factor (IF) and Emerging Source Citation Index (ESCI) during 2015 -2019 were selected ( Figure 1). Later all identified journals (sources) were verified individually from the actual list provided by WoS in study category and added in "advanced search" through field tag: SO= Publication Name [Index]. For further analysis, we selected 32 out of 38 journals in the same category as shown in Fig 1. For further analysis, all 31 sources from Q4-IF and 32 out of 38 ESCI journals were selected, 6 titles were excluded from analysis because of incomplete coverage in WoS as shown in Figure 1. Data was extracted from WoS in plain text format and converted to bibliometric format and later bibliometric analysis at source level, author level and at document level were performed using R "Bibliometrix" package (Aria & Cuccurullo, 2017). At source and author level, impact was assessed by h-index and g-index. The h-index measure both impact of citations and publications productivity (Hirsch, 2005; Vílchez-Román, 2014). While the g-index measure productivity based on the distribution of citations received by a researcher's publications (Egghe, 2006). Average Citation per document is an indicator showing citations per publication to quantify the author's impact, countries, and journals (Yi, Ao, & Ho, 2008). It is calculated as total number of citations received by total number of documents published by a journal to assess the yearly impact and provides fairer evaluation for author and journal activity (Harzing, 2010). Dendrogram was planned to evaluate keywords.

Results
The search strategy yielded a total of 12808 documents from 31 Q4-IF journals and 4294 documents form 32 ESCI journals by 16121 and 6374 authors respectively. Similar collaborative index was observed for Q4-IF (1.41) and ESCI (1.61) journals. Higher average citations per documents 1.719 is observed in Q4-IF journals as compare to ESCI. Details of database bibliometrics characteristics given in Table 1.    Table 3 shows the top 5 country appearances, corresponding author country and local cited papers. China and India were found to have most authors appearances and corresponding authors.
In Q4-IF journals, China, Iran and USA showed the leading contributions. Ferdowsi univ mashhad showed strong collaboration between China and Iran. King Abdulaziz Univ showed strong linkage with china and few with other countries. Macmaster Univ showed strong collaboration of China and Canada while Nankai Univ showed greater collaboration with china and turkey.
In ESCI journals Korea, Egypt and India were leading contributors. Benha Univ showed strong linkages of researchers from Egypt, India and USA while King Abdulaziz university collaboration was observed with Egypt, India and Pakistan. Maximum likelihood estimation was observed most occurring author keyword in both Q4-IF and ESCI journals. Details given in Figure 2 Sankey Diagram.   Figure 3: Top 5 Sources trend by year and Impact Among Q4 journals "Communications in Statistics-Theory and Methods" was the leading contributor as shown in the figure 3 (a). Same five journal also appeared in top 5 journals with respect to impact. Figure 3 (b) shows the top 5 ESCI journals where "Korean Journal of Applied Statistics" was found to be the most productive.
In Q4-IF Journals 'Distribution', 'Estimation', 'Asymptotic normality', 'Maximum likelihood' and 'Order statistics' were top trending author's keywords. While 'Maximum Likelihood Estimation', 'Order statistics', 'Simulation', 'Bias' and 'Maximum likelihood' were observed to be the most trending author keywords. Collectively in both Q4-IF and ESCI journals 'Maximum likelihood' and 'Ordered statistics' were observed to be most predominant keywords. Bibliomining

Discussion
This article shares the bibliometric analysis for all documents published in Q4-IF and ESCI journals in WoS category 'Statistics and probability' between 2015-2019. In general, a uniform trend in terms of numbers of publications each year with majority as articles in both categories was observed. Almost equal number of journals were found in each category, but the number of articles/documents produced by 31 Q4-IF journals were nearly 3 times than that of 32 ESCI journals in same time frame. Similar trends were observed for most of the other variables including number of authors. Articles were the most common document type followed by editorials and single author documents were < 1/4 th for both categories. Though, majority of the published documents were multi-author documents yet relatively higher number of single-authored documents, 23% and 22% was observed both in Q4-IF and ESCI journals.
For Q4-IF category, authors 'Balakrishnan N' and 'Nadarajah S' were most productive and authors, 'Balakrishnan N' and 'Cordeiro GM' showed maximum citations. While for ESCI, authors 'Hamedani GG' and 'Cordeiro GM' were most productive and authors, 'Afify AZ' and 'Cordeiro GM' showed maximum citations. Two authors, 'Cordeiro GM' and 'Alizadeh M' were among the 10 most productive authors in both categories. Interestingly, both of these prolific authors showed relatively less contributions as corresponding and/or first authors as compared to other leading authors.
In terms of authors' country appearances, China, USA, France, Iran and India were leading countries for Q4-IF documents while for ESCI documents, India, USA, Korea, Iran and France were leading. Mostly similar trends were observed for corresponding author countries for both Q4-IF and ESCI documents. The USA and India were common among top 5 contributor for both Q4-IF and ESCI categories. Among affiliations, Ferdowsi univ mashhad showed relatively more collaboration between China and Iran. King Abdulaziz Univ showed strong linkage with china and few with other countries. Macmaster Univ showed strong collaboration of China and Canada while Nankai Univ showed greater collaboration with china and turkey. While for ESCI journals Korea, Egypt and India were leading contributors. Benha Univ showed strong linkages of researchers from Egypt, India and USA while King Abdulaziz university collaboration was observed with Egypt, India and Pakistan. Maximum likelihood estimation was observed most occurring author keyword in both Q4-IF and ESCI journals. Study findings also suggest that though developed countries with affiliations were more among top contributors in Q4-IF than in ESCI category, yet many Asian countries and affiliations were dominant contributors in either category.
Among Q4-IF journals "Communications in Statistics-Theory and Methods" was the leading contributor throughout all 5 years with relatively higher h-index and total citations, followed by "Communications in Statistics-Simulation and Computation" and peak of publication frequency was observed in 2017 for both journals. However, "Journal of Biopharmaceutical Statistics", "Statistics & Probability Letters" and "Hacettepe Journal of Mathematics and Statistics" showed steady trend over time. Same five journal also appeared in top 5 journals with respect to impact. For ESCI journals, "Korean Journal of Applied Statistics" was found to be the most productive. While a sharp decline in publication frequency was observed for "Journal of Statistics & Management Systems". While, journals "Advances and Applications in Statistics", "Journal of Statistical Theory and Practice" and "Pakistan Journal of Statistics and Operation Research" showed steady trend over time. With respect to impact "Wiley Interdisciplinary Reviews-Computational Statistics" and "Metron-International Journal of Statistics" were shown to be predominant.
Mainly three and four clusters of keywords/topics were found in Q4-IF and ESCI categories with similarities and overlap. Diverse but mainly similar trends of keywords and topic coverage were observed for both Q4-IF and ESCI journals while 'Maximum likelihood' and 'Ordered statistics' were observed to be most predominant topic collectively. Although limited available literature and data for comparison was a limitation yet it also suggests for further and continuous exploration of trends and relevant analysis.
In terms of other limitations, analysis was conducted only on WoS-Q4-IF and ESCI journals in the "Statistics & Probability" category with limited timeframe of 2015-2019 that may limit the generalizability of finding to the category in general. Secondly, limitations in WoS database may have some unidentified issues, however the findings shared here for the leading contributors were manually verified. Additionally, continuous changes and updates may show different publications data to be analyzed depending upon date of search and timeframe. Metadata from other sources might be beneficial to complement this study and provide comprehensive context on the subject.

Conclusion:
Considering scarcity of literature on 'Statistics & Probability' publication trends, despite its significance in research and academics, this paper assists to fill the gap by providing overview and salient trends in WoS-Q4-IF and ESCI "Statistics & Probability" category (2015-2019). A consistent publication trend was observed in terms of documents production but Q4-IF productivity was relatively much higher. Articles were the major type of document for both. Among prolific authors, only two were common between Q4-IF and ESCI categories. Overall, 114 countries contributed to the selected Q4-IF journals led by China and USA while India, USA and Korea were leading for ESCI journals among 106 countries. Mainly, similar trends were observed for impact and contributions as corresponding and/or first authors. Diverse but mainly similar trends of keywords and topic coverage were observed for both categories. Although limited available literature and data for comparison was a limitation yet it also suggests for further and continuous exploration of trends and relevant analysis. In conclusion, the bibliometric findings of this study can benefit relevant stakeholders and particularly researchers to better understand the performance and trends of study subject and plan with better informed decisions with the help of these findings.