With the advent of Web 2.0 era, different types of virtual communities are emerging increasingly. As a new generation formation of network community, Knowledge Community, like Quora, is characterized by knowledge sharing and exchanging, and always emphasized to motivate users to generate more contents by adding social features. Since it provides a platform for group cognitive, group collaboration and group decision-making, this kind of community is often recognized as a prototype of Collective Intelligence (CI) .
The related study of collective intelligence had mushroomed in the field of sociology in the past, which paid many attention to the content of collective intelligence existed in off-line organizations or enterprises. Psychologist Muller (1970) proved the important role of collective intelligence in education choice . Surowiecki (2004) regards collective intelligence as a new management philosophy and management model, which sets off major changes in the enterprise management . However, with the evolution of the Internet, a dramatic increase in users and easy access to online environment bring more complicated group of people. In addition, although the amount of online information received is extremely growing, the amount of information that an individual can handle is limited. Therefore, it is necessary to further study the collective intelligence in novel scenario, especially in knowledge community, due to the characteristics of collective intelligence has changed enormously.
There are many researches on collective intelligence of knowledge community. Tapscott  indicated that the Internet provides a platform for large-scale collaboration, and everyone can participate in the process of knowledge production. Therefore, the relevant studies on the emerging of collective intelligence mostly focused on the qualitative research, theories and methods of quantitative researches are few. Meanwhile, many knowledge communities emphasize its social attributes to motivate their continued participation. For example, people can easily communicate in the social network, which could promote the spread of knowledge . Yang (2014) exploring the influencing factors of knowledge sharing among members in community by studying the social network structure in virtual community . However, few studies have examined the sociality of knowledge community.
In this paper, we use a new idea that exploring the effect of group elements on collective intelligence by calculating the structure index in the social network method, to find group structure and member characteristics which are beneficial to the collective intelligence. Social network analysis is a kind of network method endowed with sociological significance . So, we can explore the sociological connotations of collective intelligence by this method. At last, experimental data from “Zhihu” community is adopted to modified and verify the hypothetical model. The result of this study reveals the emerging mechanism and driving factors of collective intelligence, which provides important implications for improving the level of collective intelligence in knowledge community.
The remainder of the paper is organized as follows. Section 2 provides a brief literature review. Section 3 describes the hypothesis model and research methods. In Section 4, the verification process of proposed model from a sample database “Zhihu” is presented. And finally, we conclude the paper in Section 5.
2. Related Works
This chapter is an introduction to the relevant fields covered in this paper and a review of the previous research literature. It mainly focuses on the following three aspects: collective intelligence, knowledge-based community, and social network analysis methods. In defining the knowledge community, we first introduce the evolution of Internet Web technologies and virtual communities, and the emergence of knowledge communities. Finally, the research methods and social network analysis used in this paper are summarized.
2.1. Collective Intelligence
Collective Intelligence is a process that a group of members use their knowledge to adapt the environment, put forward different views or methods, and eventually, give a better explanation or solution to the problem . Based on the results of previous studies, we make a definition for collective intelligence which is individuals through participation and collaboration led to the emergence of collective intelligence, as well as the ability to solve problems  . From this definition can be seen that the collective intelligence is more focus on a capacity, not simply a superposition of individual ability.
At present, there is little research on the measurement of collective intelligence, but more on the factors that influence the formation and development of collective intelligence. For example, Len Fisher argues that it could make better decisions in the community by drawing on mechanisms for coordinating of biological populations in nature . Cai (2012) think that the collective intelligence will be affected by the following factors such as group size, membership heterogeneity, group cohesion, communication technology and group conflict . Scholars believe that group size, membership diversity, independence and decentralization, and even technical factors have a significant impact on collective intelligence, but these factors are more like the case of abstract expression, there is no inclusion in a theoretical framework. Mačiulienė (1970) integrated the measurement of collective intelligence into three dimensions: 1) Capacity Dimension which is representing that the integration and creation of knowledge, group decision-making process. 2) Emergence Dimension which is representing that indicates the self-organization, adaptability and gathering effect. 3) Social Mutuality Dimension which is represent that the social impact of members and their social motivations . These are some other methods in Table 1.
Table 1. Methods of research on collective intelligence.
2.2. Knowledge Communities
The popularity of World Wide Web provides people with a more convenient platform for communication and sharing, it also boosts the flourish of kinds of knowledge communities. Howard (1993) thinks that knowledge community is social gathering place where people discuss public events, express emotions, and form online relationships . The biggest characteristic of knowledge community is that people could break through time and space to communicate freely by the way of network interconnection.
There are few studies on the knowledge communities. Song (2015) use SNA (Social Network Analysis) to study social Q&A platform, which is applied to the network density, cohesive subgroup analyses, structural holes, the network centricity, has great reference value for this article . Wang (2013) use quantitative analysis to study famous social Q&A platform Quaro, it reveal that three kinds of network structure in this kind of platform, which has great guiding effect on this research . J. Koh & Y.G. Kim (2004) conducted a questionnaire survey and analysis of members of the knowledge community, the result is that knowledge sharing will stimulate more community members to participate in communication and increased the knowledge community members’ loyalty to community service provides . Alexander Ardichvili & Vaughn Page (2003) suggests that the willingness to share knowledge is due to community culture, self-interest, and trust among members .
Scholars have analyzed and discussed various factors that affect the knowledge community, but there is little research on the social attributes of the knowledge community. According to the research report of iResearch.cn, the focus of the development of the knowledge community has been on the SNS (Social Network Service), and user in the SNS website has a strong sharing. This article research the effect of social attributes on the emergence of collective intelligence in the knowledge community.
2.3. Social Network Analysis (SNA)
The SNA is the synthesis of many disciplines. It is the combination of social theory and application, formal mathematics and statistics, which is the combination of graph theory and sociology . The most representative of the study is the theory of Granovetter (1973), in which he distinguished the strength of the connection and pointed out the importance of weak relationship . Since the 1970s, in addition to the discussion of the method itself, the social network analysis also explored the special network forms of small groups, cohorts, social circles, and internal networks of organizations, market networks .
There is little research on knowledge community by using SNA. Jankowski-Lorek (2016) use social network modeling to study the behavior patterns of authors in Wikipedia, in order to provide insights into how Wikipedia can better manage false information and write conflicts in communities . Yang (2014) explores the influencing factors of knowledge sharing among members in the community by studying the social network structure in the virtual community . Fu (2009) apply the network quantification method to analysis the social network structure of the virtual community, and the differences in network structure based on virtual relation and reality are discussed through the network density and centrality .
The reason that we use SNA is the social network attribute is a significant feature in knowledge community compared with the traditional Q&A platform, this paper mainly use the SNA to quantify the influencing factors of the collective intelligence. There are many excellent social network analysis tools, such as UCInet, Pajek, Gephi, which greatly facilitate the researchers, and quickly promote the application of network method in the field of sociology. In this paper, we use Pajek software to calculate link strength, cohesion, centrality and social capital in social network.
3. Hypothesis and Research Method
The content of this chapter is based on the theory of the predecessors, combined with the results of the case analysis of the community, to summarize the research framework. Secondly, the research hypothesis is put forward to determine the corresponding variables and observation indicators to explore the factors that influence the emergence of collective intelligence in knowledge-based community groups. Finally determine the indicator calculations and main research methods, and the analytical software tools used.
3.1. Research Theoretical Framework
The theoretical framework of this paper revolves around the emergence of collective intelligence. Emergence is considered to be a phenomenon related to complex systems, which refers to the appearance of new structures and new properties during the self-organization in a complex system . On the construction of the theoretical framework, we combine Maizeiulienė’s three dimensions of measuring collective intelligence into two dimensions: social network and the collective intelligence effectiveness. The dimension of social network is as factor variables in theoretical framework, and the collective intelligence effectiveness as outcome variables.
In the dimension of social network, this paper firstly makes reference to the research of peer production system, because it is closely related to the emergence of collective intelligence, which can be seen from the definition of peer production . This article mainly use Zhu’s definition about peer production: with the help of the Internet, people from all over the world use relevant tools to participate large-scale knowledge production activities . Furthermore, the social network dimension is subdivided into interactive network structure and participant’s characteristics. Therefore, this paper studies the influence of social network dimension on the collective intelligence effectiveness dimension, which can be decomposed into the influence of the interactive network structure and the participation’s characteristics on the collective intelligence effectiveness.
The effectiveness of collective intelligence refers to the knowledge, ability, and psychological encouragement which are generated by group collaboration, so we use collective intelligence effectiveness to measure the power of knowledge production and participation of group decision. Knowledge production refers to the activities of create knowledge, not only the emergence of new knowledge, but also on the integration of the original knowledge and innovation . Group decision-making has always been a popular research, and focusing on factors which affecting the quality of group decision-making. Our study adopted Lan’s viewpoint that the reliability of group decision-making is proportional to the scale of group . Therefore, the ability of the group’s decision-making can be directly measured by the participation scale of group, that is, the increase of group size will improve the effectiveness of group decision-making results.
To sum up, the theoretical framework of the emergence of collective intelligence as shown in Figure 1, the arrows represent the relationship among the dimensions.
3.2. Research Hypothesis
According to the framework of the above research, the following research hypotheses are put forward from two aspects: the impact of the interactive network structure on the collective intelligence effectiveness, the influence of the participant feature on the collective intelligence effectiveness.
3.2.1. The Influence of Interactive Network Structure
In this paper, the interactive network structure variables are determined as follows: link strength, degree of cohesion and centrality . Although these three variables are important concepts in social network analysis methods, the focus is different. The link strength mainly focuses on the number of connections in the network, and the degree of cohesion focuses on the distribution of the connections in the network . In the case of similar number of connections, if the distribution of the connection is more concentrated, the overall cohesion of the
Figure 1. The framework of theory.
network will be reduced. The centrality is concerned with the degree of differentiation between the core nodes and the edge nodes in the network. The higher the value of the centrality, the higher the level distribution in the network .
The higher link strength between members of the group, the greater the likelihood of generating new knowledge. Chang (2011) think that when members of the virtual community are more connected, the community members will have a stronger knowledge-sharing willingness . The Tsai et al. (1998) suggested that interaction and association among members had a positive impact on knowledge sharing . Yli-Renko et al. (2002) argue that the more frequent communicate in the group, the more knowledge will be exchanged . The node average value is the average node degree in the network, when the number of nodes in a network is constant, the more the network is connected, the stronger the degree of network connection is, and the average value of nodes in the whole network increases, the connection between nodes in the network will also increase . So make the hypothesis:
H1: The link strength of network has a positive effect on collective intelligence effectiveness.
H1a: The scale of network connection has a positive impact on collective intelligence effectiveness.
H1b: The average value of network node has a positive effect on collective intelligence effectiveness.
In the past, the research on the degree of cohesion is mainly through the exploration of network cohesive subgroup . Tirado experimented with 6 weeks and 10 comparative groups, it was concluded that the cohesive index had a positive effect on the social dimension and cognitive dimension of knowledge construction . In addition, you can also use the average path length of the network to measure the degree of network cohesion. Strongatz (2001) proposed small-world network model that is defined by the average path length . The shorter the average path length is, the closer distance between nodes is, the higher degree of network cohesion. The higher degree of cohesion, the higher the participation of community members, more knowledge will be generated which is conducive to the formation of collective intelligence. Hence the following hypothesis:
H2: The degree of network cohesion has a positive effect on collective intelligence effectiveness.
H2a: The number of network cohesive subgroups has a negative effect on collective intelligence effectiveness.
H2b: The average path length of the network has a negative effect on collective intelligence effectiveness.
In the past, the research on network centrality is to study the relationship between the center position of the network and knowledge transfer. Most research agree that the individuals are closer to the center of the network, the more positive their impact on knowledge sharing or transfer. However, there is little relevant literature on the impact of network centrality on the emergence from the perspective of the overall central degree of the network. From the previous research, it can be concluded that network heterogeneity is the key factor for the emergence of collective intelligence. When the network is more evenly connected, the network structure is more flat, it means that the network is difficult to find significant impact on other members of the opinion leaders. The centrality of a network is usually quantified by three network-wide indicators: degree centrality, closeness centrality, and betweenness centrality. Hence the hypothesis:
H3: Network centrality has a negative impact on collective intelligence effectiveness.
H3a: The degree centrality has a negative effect on collective intelligence effectiveness.
H3b: The closeness centrality has a negative effect on collective intelligence effectiveness.
H3c: The betweenness centrality has a negative impact on collective intelligence effectiveness.
3.2.2. The Influence of the Participant’s Characteristics
The reason for considering a participant in a community is that the level of collective intelligence may be different even for similar community network structures, because that the members who make up the community are different. To understand the members of the community, the most important factors are the individual attributes, behavior preferences and knowledge structure. As we know that the scale of group participation and the diversity of members are very important for the emergence of collective intelligence. As to the measurement of the diversity of group members, the mainstream view is that the dominant and recessive features of members should be measured. The dominant features include membership age, gender, education and occupational background, and recessive features include membership attitudes, values, knowledge and ability. Because this research is experiment in a virtual knowledge community, it is difficult to obtain the true dominant and recessive features of the members in the community. In addition, member networks cannot be constructed by capturing the mutual attention behavior of members because “Zhihu” protect the user’s data. To this end, the membership network Mi is constructed on the basis of the member-question network Qi, and it is considered that there are interactions among the members if they answer the same question, and then from the member network Mi, based on the use of social network analysis of the island algorithm to measure the degree of diversity of community members.
Research on the other side of the participant is the demarcation of the core and non-core members of the community. The method is to find the core and non-core members of the community by exploring the membership network Mi. The core and non-core members are divided to assess the impact of social connections on collective intelligence. According to Gang Wang et al. (2013), because the core members have more followers and each core member’s response will push a message to their followers, their responses are higher which represent the core members tend to obtain more votes on their responses . At the same time, Lan (2009) also believes that the effectiveness of group decision depends on whether the members are independent of each other . It can be seen that social factors contribute to the prosperity of knowledge communities, but also inevitably have negative effects. In this paper, this effect is defined as the core member effect, and that is conducive to the improvement of collective intelligence in a community which core member effect is non-significant. Therefore, the following hypotheses were made:
H4: The diversification of community members has a positive impact on collective intelligence effectiveness.
H5: The network core member has a negative impact on collective intelligence effectiveness.
4. Network Model and Hypothesis Testing
4.1. Data Collecting
This paper chooses the “Zhihu” as the research scene, which has three kinds of network structure. These three network structure are the member of the social network, members concerned about their interested question which form member-question network, members concerned about their interested topic which form members-topic network. Zhihu will screen out the elite response for each question, which are the result of the group’s efforts, and represent the integration and creation of knowledge. In some way, Zhihu reflect the collective intelligence, which encourage members to produce knowledge, and then through the collective voting select high-quality knowledge, also include interaction between members.
This paper uses the LocoySpider to obtain data of Zhihu community. In order to get more samples, we choose the secondary and tertiary sub-topics under the topic of “social science”, “natural science” and “formal science”. At last, we obtained 27 samples. The collected data includes topic number, the number of follower, the number of question and elite answer. As to December 8, 2015, a total of 1,246,830 questions were collected, the total number of elite questions is 15,852, and 31,003 community members participate in the elite question. The detailed data are shown in Table 2.
As can be seen from Table 2, with 50 answers only in the minority, but there are one-third questions that no answer. For example, the topic of “Communication studies”, with 333 questions which contain 50 and above the answers, accounting for 0.89%. But there are 15,495 questions that no answer, accounting for 41.50%. It can be seen that considerable number of questions have not been adequately discussed in the Zhihu community. Take “Communication studies” as an example, the distribution of the answer shown in Figure 2. Among them, there are 15,095 questions answered by no one, less than 10 answers about 94.6%, and the highest the numbers of answers is 4466, which indicates that most of the answers are concentrated in a few questions.
Next, in order to explore the relationship between the number of follower and the number of questions, the sample is divided into three layers according to the number of follower, a topic of less than 10,000 followers, a topic of 10,000 to 100,000 followers, a topic of more than 100,000 followers. The Pearson coefficients of the two variables were calculated using SPSS software. The results are shown in Table 3.
Table 2. Data collection results overview.
Table 3. Topics focus on correlation between volume and issue number.
Note: * = 0.05 significance level.
Figure 2. Topic “Communication Studies” answer weight distribution.
It can be seen that only the number of followers in more than 100,000 will appear significant correlation. This shows that the number of question will increase with the growth of attention to some extent. There are two points to explain above data. 1) Due to the LocoySpider can only get the first 50 answers for each question, so we only crawl on only the top 50 votes in the top answer. From the previous analysis, there are small proportion question that the number of answers exceed 50, and the number of votes after 50 votes received little votes. Therefore, it is not affect the later studies when we crawling only 50 answers. 2) Among all the questions on different topics, the proportion of unanswered questions is large. On the one hand because most of these questions have just been raised, no members have to answer, on the other hand because the question itself is little value. If these non-response questions are included in the network model, the network will lead to more missing values. The final decision is that only the elite answers as a basis for building the network model.
4.2. Network Model
Firstly, the Pajek is used to establish the bipartite network model Qi for each topic. Different topics are the boundaries of the network, a total of 27 members-question networks were established. For example, the network model “behavioral science” is shown in Figure 3. The network contains a total of 13,531 nodes, including 628 elite questions, with black nodes to represent; and 12,903 members, with a white node to represent. The network contains a total of 15,549 undirected connections, which represent those members to answer the question. And then use Pajek to calculate the structure index of each topic network. The results are shown in Table 4.
Figure 3. “Behavioral Science” topics bipartite network model Qi.
Table 4. Subject network Qi structure index results.
Collected data cover the whole topic in the community, which indicates that the sample is available. Among them, 14 sub-communities have more than 10,000 participants, and there are 18 answers on average, indicating that most sub-communities have a higher degree of participation. From the node degree, most of the average node degree of the network between 2 to 3, indicating that the average number of questions answered in three or less, and the maximum node degree value in the topic network has little effect on the average node degree. For example, in the “theoretical computer science” topic, although the some members answered 102 questions, the average value of the network is still 2.59. From the number of subgroups, it can be concluded that the number of subgroups decreases with the increase of participation scale. From the average path length, most of the network length is 6 or so, the overall average of 6.55. From the three indicators of centrality, the values are too small, especially degree centrality. Based on the above analysis, the maximum node centrality value has limited ability to reflect the overall situation of the network. Therefore, participation scale and maximum node centrality value are not included in the research hypothesis.
4.3. Define Participant Characteristics
4.3.1. Measurement of Member Diversity
On the measure of diversity of members, social networks island algorithm will be used. Island is an algorithm used to find the core network, firstly, the core part of the network is selected, and then the core nodes are divided into sub-networks. It can be seen that the nature of the island algorithm is to classify the members according to whether the members answer the same question. The subnets obtained by this method mean that there is a relative difference between members, and the number of categories represents the degree of the diversity of members. If the number of islands is larger, it indicates that the members of the community have higher degree of diversity.
In this paper, the process of measure final core members is as follows: 1) all members who involved the question are the participant; 2) based on the “member-question” network, the bipartite network model is established; 3) the core member are identified by using the social networks island algorithm; 4) find the important member whose answer has 5 or above votes; 5) find the intersection between the initial core member and the important member, which is the core member of the topic. The final result is shown in Table 5, and for the measurement of the diversity of the members, the number of network islands is main indicator. The more number of islands, the higher diversity of the members in the knowledge community.
4.3.2. Measurement of Core Member Effect
The measure of core member effect is achieved by the following two indicators. The vote ratio of core members is represented by μ1 and indicates the number of votes about core member answer the questions. Correspondingly use m represent the total response of the core member, Aj indicates the number of votes about each member answer the questions and n is that the total number of member, therefore, we propose the following equation:
Table 5. Definition of final core member.
The best ratio of core member represented by μ2, N represent that the number of question which has highest responses of the core member, q represent the number of elite questions in the sub-communities, that is,
Finally, μ1 and μ2 are summarized to get σ, and σ is normalized to get the score of core membership effect, that is,
The results are shown in Table 6. The vote ratio of core member μ1 mostly small, the largest 0.446, the remaining 17 sub-community value is less than 0.1, indicating that core members have limited influence; and the best ratio of core member μ2 is also lower, no more than 0.4. The combination of μ1 and μ2, you can get each topic sub-community core member effect. The higher the core member effect value, the greater the impact of the core members on the other members in the sub-community. Due to the core membership effect of each sub-community is low, it can be concluded that the way of the community members select the optimal answer is more based on the answer content itself, instead of core members.
4.4. Collective Intelligence Effectiveness
According to the study of the knowledge community, the collective intelligence effectiveness can be embodied as two aspects: group knowledge production and group decision-making participation. In the aspect of group knowledge production, you can use the number of elite question and the cited frequency of best answer in the sub-community. And decision-making capacity can be measured by the decision-making frequency in the topic, that is, the total number of votes. The numerical value of collective intelligence production dimension index can be obtained by online collection or simple calculation directly, which the number of elite questions and the best answer cited frequency can be collected through the online, The average number of votes represented by λ, that is
The average votes of the question λ should be normalized, which is group decision participation score. And then integrated group knowledge production score and group decision participation score, ultimately obtain the collective intelligence effectiveness in Table 7. Among them, the higher the value of collective intelligence effectiveness, the higher the level of collective intelligence. In the selected samples, the topic of sociology is the highest in the community, while the topic of pedagogy is the lowest in the community.
After the calculation of all the indicators in this study, the correlation analysis in SPSS was used to validate the hypothesis. The correlation analysis results are shown in Table 8.
Table 6. Core membership effect.
Table 7. The result of collective intelligence indicators.
Table 8. Observation in the model index SPSS related results.
Note: * = 0.05 significance level; ** = 0.01 significance level.
The result of correlation analysis shows that the hypothesis of this paper has been verified. In summary, the interactive network structure, participant’s characteristics, and collective intelligence effectiveness in the social network are important to measure the emergence of collective intelligence, which has also been confirmed in previous studies. On this basis, this paper further analyzes the internal relations between these dimensions. Through the above experimental results, the topic sub-community with higher collective intelligence has the characteristics of high network connectivity and decentralization. Therefore, the conclusion of this paper is that the network structure of tight connection and uniform distribution would promote the emergence of collective intelligence, and the diversification and independence of community members leads to the formation of this network structure.
From the perspective of social network analysis, this paper studies the emerging mechanism and influencing factors of collective intelligence in knowledge community. Based on literature review and case study, this paper proposed the research model and used the social network modeling method to calculate the observation index and find the core members in the community. Finally, according to the conclusion of the experiment, we bring forward the network structure which is beneficial to the community collective intelligence, and summarize the emergence mechanism and influencing factors of the collective intelligence from the social network and the successful operation experience.
There are also the following shortcomings. It is necessary to further explore the relationship between variables, especially the variables in the social network dimension. The relationship between the variables and the indicators needs further demonstration. The relevant assumptions and indexes in the experiment process need to be optimized, such as the exploration process of network subgroups, the definition of community core members, the measurement of member diversity and independence. The sample data is slightly inadequate; the reliability of the model needs to be further tested.
In view of the above shortcomings, the proposed future research can be carried out in the following aspects. This article is only from the perspective of social network on the emergence of the impact of collective intelligence, this is not enough, and we should be combined with more research point of view. In the process of study, we improve the indicators of community network structure, member diversity and independence, collective intelligence efficiency, further optimize the research model, and deepen the interrelationship among influencing factors, to increase the amount of sample data to optimize the organization and cleaning of data to improve the credibility of the model assumptions.