Current Automatic QA frameworks have limited performance which can be enhanced by a framework of aggregate knowledge called Community Question Answering (CQA). In a CQA system, users can ask and answer questions in different classifications . A significant number of web indices and interfaces including Answer.com and Stack Overflow have presented distinctive variants of CQA administration. The process involves an asker posting an inquiry in a CQA framework and afterward askers give answers to the question. After certain number of answers have been gathered, the most appropriate answer can be picked (voted) by users. Subsequent questions and coupling answers are archived in a database. The database supplements online inquiry, as in Naver’s Ji-Sik-In (Knowledge iN) that had collected around 70 million sections . In an ideal scenario, a search engine can serve similar questions or use the best answers as search result snippets to handle similar queries. It is assumed that the best answers from CQA services are good and relevant answers are useful for pairing these questions.
Posting and getting answers to a question in a CQA is an important procedure  . A user posts a question by selecting a category, and then enters the question subject (title) and, optionally, details (description) as shown in Figure 1.
A question remains valid in a CQA if it belongs to a category, thus can be answered, commented on and voted for. A question can only be answered by another member of a CQA and not the asker. Question status can be closed from receiving answers if the answer received satisfies the asker. If the asker is satisfied with any of the answers, he can choose it as the best, and provide feedback ranging from assigning stars or rating for the best answer, and possibly textual feedback. CQA supposes that in such cases, the asker is likely satisfied with at least one of the responses, usually, the one the asker chooses as the best answer.
The components of CQA services available to users include:
1) A mechanism for question submission;
2) A complementary mechanism to deliver answers;
3) A web-based platform to facilitate users’ interactions.
In a CQA system, users can ask or answer questions on different topics. This generally attracts numerous responses to single inquiry . These kinds of services exhibit large-scale participation with the inherent challenge of ensuring
Figure 1. Question flow pattern in a CQA.
that all questions get timely and correct answers. Services like Yahoo! Answer and Stack Overflow organize questions using tags and categories. This enhances answering of questions, based on user’s preference. CQA has received increased patronage from users, with question latitudes spanning understanding and abusing client plane , comparable inquiry , and question quality expectation . Research on CQA administration involves examining the client’s experience, intentions, and strategies by which individuals look for and share data. It might likewise include framework advancement for supporting such exercises.
This research leveraged on characteristic advantages and limitations of existing CQA’s different approaches to derive a model that pulls votes and closeness among words for the best answer selected in a (CQA) system and implement the resultant model.
2. Literature Review
2.1. Overview of Some Existing CQA
Yahoo! Answers was incorporated in 1995. It is a community-driven QA site which allows users to submit questions and answer questions from other users. It was finally made available for general use in 2006. Members are allowed to earn points based on Naver’s Knowledge as an encouragement to participate on the platform. However, the platform is characterized by poorly formed questions and inaccurate answers.
Quality of answers given in Yahoo Answers cannot be verified and can be misleading due to the missing relationship between answers given and vote received on individual answers. Voting pattern in Best Answers Recommendation Model is a means to identify misleading information thereby providing relevant answers received via vote to community users. Figure 2 depicts the simplified lifecycle of a question in Yahoo! Answers.
E-How is based on how-to guide which consists of more than 1 million articles, providing users with step-by-step instructions on how to write articles. Any E-How user can give comments to the article submitted, but only the article writers have the privilege to change the content of the articles. Content delivery approach of E-How result in low-quality content and for operating as a content
Figure 2. The simplified lifecycle of a question in Yahoo! answers (Source: Chen, 2014).
farm, paying contributors low rates for content intended to rank high in search results, rather than focusing on quality information, with poor quality articles intended mainly to drive up search results rather than provide information.
StackOverflow  focuses on a wide range of topics in Computer Science. Similar to the method of operation in Quora, users of StackOverflow can earn points and badges. If a user needs to resolve a difficult question, the user can pay reputation points to other users as tokens (which are known as bounty). Users on StackOverflow are mostly technology geeks, who are often driven by the motive of winning the game and gaining reputation points.
2.2. Related Works
More question answering recommender systems have been developed over time, each having its uniqueness in specified domain worked on, information filtered and data set used. Participant reputation was used in  while addressing two research questions, firstly by reviewing different link analysis schemes especially discussing the use of PageRank based methods since they are less commonly utilized in user reputation modeling. They also introduced Topical PageRank analysis for modeling user reputation on different topics. Comparative experimental results on data, released unto the team from Yahoo! Answers, shows that PageRank-based approaches are more effective than HITS-like schemes and other heuristics, and that topical link analysis can improve performance. In HITS scheme,  identiﬁes two important properties for a web page: hubness and authority, and proposes a mechanism to calculate them effectively. The basic idea behind HITS is that pages functioning as good hubs will have hyperlinks pointing to good authority pages, and good authorities are pages to which many good hubs point. PageRank implemented by  is a static ranking of web pages based on the measure of prestige in social networks, hence it could be seen as a random surfer model. Although, the system is good at displaying most ranked users remarks on a question, it does not put into consideration the asker’s thought, users’ voting methods and other factors that can influence the voting behavior of the system.
Michael et al.  developed a system that focused on question generation (QG) for the creation of educational materials for reading practice and assessment. The goal was to generate fact-based questions about the content of a given article. The top-ranked questions could be ﬁltered and revised by educators, or given directly to students for practice. They restricted their investigation to questions about factual information in texts. This type of system makes it possible to generate thousands of questions using Wiki documents, thus forming and generating more data set. The disadvantage of the system shows that, despite the generation of multiple questions and relating those questions to retrieved answers, there is a limit to the scope in terms of restriction to certain subject matters.
Shuo et al. , aimed at enhancing question routing algorithms by targeting at improving lasting value of the answers in addition to reducing the response time. They sought to find a set of users who would collaborate together to provide content with lasting value on a QA thread. To tackle this problem, a framework to capture compatibility, availability and expertise of the users was proposed. The results of the framework on Stack Overﬂow show that the timely collaboration among the users leads to improving the lasting value of a QA thread, thereby validating the hypothesis used. They also observed that different types of users have different propensity to answers and comments. As a result, the strategy is to build separate lists of answerers and commenters. They considered comments as a ﬁrst class citizen of a CQA system; as often times comments critically evaluate an answer leading to clariﬁcations and reﬁnements in the answers—in turn increasing the overall value. Nevertheless, in the system, only some specific group of users can answer a particular question. It would have been better to present the questions in an open domain, allowing interested users comment and vote for best answers given by any group.
In , a system that refines and matches question and answers from a knowledge base (called Question Answering Refinement (QAR) System) was proposed. The proposed system relies on the pre-computed word-correlation factors in the word-correlation matrix for matching archived questions, and ranking answers to questions. The word-correlation factors were generated using a set of approximately 880,000 Wikipedia documents, and each correlation factor indicates the degree of similarity of the two corresponding words. One of the advantages of this system was that Wikipedia documents were chosen for constructing the word correlation matrix, since they were written by more than 89,000 authors, with different writing styles, using various terminologies that cover a wide range of topics, and with diverse word usage and content. Furthermore, the words in the matrix are common words in the English language that appear in various online English dictionaries. However, this system shows that a change or modification in the archived document of wiki will affect the ranking and selection of recommended answers to users due to multiple authors.
A Question Condensing Network (QCN) that makes use of the subject-body relationship of community questions was proposed in . In the model, the question subject is the primary part of the question representation, and the question body information was aggregated based on similarity and disparity with the question subject. They proposed to treat the question subject and the question body separately in community question answering. The system introduced a new method that uses the multi-dimensional attention mechanism to align question-answer pair. The disadvantage of the system emerges from the adoption and implementation of multi-dimensional attention mechanism in the aggregation of the similarities between subject and question body.
This research is motivated by the need to build a Semantic Data Source (SDS) for related questions asked, implementing Normalized Google Distance (NGD) to show relationship between questions asked, in order to reduce response time to auto-generated answers. Integrating SDS and NGD is expected to introduce a new view to presenting likely answers to new questions posted.
3. The Proposed System
The proposed system in this work is described as follows: for each question q in a set of questions Q, , with corresponding set of answers , there exists a group of community members V who are engaged in voting for the best answers. Each member in V selects a set of questions to consider for voting from a pool. Subsequently, for each question, each voter casts a vote for only one of the answers that the question received.
For a question q with a voter, making a choice of his answer , on question , the voter score, is obtained as in Equation (1):
where ; designates the number of users voting on answers to questions in Q; designates an instance of question considered for voting.
This CQA Application is made up of five (5) different processing stages as shown in Figure 3. These stages are: Activity Panel, Process Content Panel, Algorithm Panel, Database and Content Preview.
Figure 3. System architecture and process design.
3.1. Activity Panel
Activity panel represents the upper layer of the application whose main focus is to provide authentication security to the CQA System. The layer present to the users all available questions and answers related to the category selected by the user.
3.2. Process Control Panel (PCP)
PCP is the application layer that handles all forms of request and content parsing that relate with the database. This security measure allows the application to control Movement of Data, Data Authenticity, User Control, Content Control, Content Filtering and Integrity Control of the application software (as shown in Figure 4).
3.3. Algorithm Panel
Algorithm panel has two different layers used to process every operation of this CQA.
The algorithms are:
1) Brouwer Fixed Point (BFP) Algorithm;
2) Normalized Google Distance (NGD).
3.3.1. Brouwer Fixed Point (BFP) Algorithm
Brouwer’s fixed-point theorem is a fixed-point theorem in topology, named after L. E. J. (Bertus) Brouwer. It states that for any continuous function F mapping a compact convex set to itself there is a point x0 such that . Brouwer’s theorem are general for continuous functions F from a closed interval i in the real numbers. Brouwer Fixed Point Theorem is applied to determine the best answer for question q based on the resulting distribution of votes across all the answers in Aq.
In a scenario where we have question q with two different voters, vi and vj, each making a choice of their answers, ai and aj. We find the total sum, Ri and Rj of votes cast for ai and aj:
Figure 4. Process control panel block diagram.
where n > 0.
where n > 0.
The system also provides the total number of users Uqi voting on a question denoted as |U|.
Summarizing Equation (1) to find the Fixed-Point value of the voters score ri and rj:
We have the voter score with and as:
The Fixed-Point Score (FPS) value is given as:
is the fixed point for function F.
To simplify Equations (4) and (7), we have:
To determine the Answer Score for the selection above, we must specify the Fixed-Point Scoring (FPS) of individual answers based on the distribution of votes across answers and the scores of voters who cast the votes. Given a question q and its corresponding set of answers:
where is the size of , we calculate FPS as:
For each question q we rank the answers according to their FPS and set the highest scoring answer as the FPS best answer.
3.3.2. Normalized Google Distance (NGD)
There is always a need to measure the distance or the relationship between different words in the scope of this work. Shannon information theory was introduced and aimed at providing means for measuring information . More precisely, the amount of information in an object may be measured by its entropy and may be interpreted as the length of the description of the object in some encoding way .
The application adopted mathematical model used by Google to search relationship between words in indexed pages. This mathematical model is based on Kolmogorov complexity. The classical notion of Kolmogorov complexity is an objective measure for the information in a single object , and information distance measures the information between a pair of objects . Assuming we have a search, term x and y proposed to be used in the NGD Engine, the search engines discover the meaning of words and phrases relative to other words and phrases in the sense of producing a relative semantics betweenx and y.
This is given by
where f(x) denotes the number of returned data records containing occurrences of x, f(x; y) denotes the number of records containing occurrences of both x and y, and N denotes the total number of records saved into the database or indexed for the occurrence of x and y.
Let X denote a finite multiset of n finite binary strings defined by . We use multisets and not just a set, since in a set all elements are different while here, we are interested in the situation where some or all of the elements are equal.
Information distance is defined in by:
For the Google Distribution computation, we have the following:
Let the set of singleton Google search terms be denoted by S and . If a set search word has n singleton search term then, there are such set search terms.
There are set search terms consisting of n non-identical terms and hence:
Let X be a multiset of search terms defined by with for and X be the set of such x.
The application environment (as described in Figure 5) is divided into different programming modules, each module performs different functions depending on the view being passed. The available modules are shown in the block diagram in Figure 6.
User Information Record Base
In the CQA application, there exists a database of users in order to enforce security and provide an environment where robots are not allowed to post question or respond to answers in place of human. The record base is divided into two:
Figure 5. Application description.
Figure 6. System module block diagram.
1) Login Record Base;
2) Personal Data Record Base.
These two record bases are linked together with a unique ID that performs one-one relationship mapping as shown in Table 1. Table 2 and Table 3 describe the different types of users found in the system with the levels of privileges respectively. A sample of users and their privileges is presented in Table 4.
A user can be administrator, member or content moderator.
4. Results and Discussion
The proposed system was implemented with Hyper-Text Markup Language 5 (HTML5), Cascading Style Sheet 3 (CSS3), JQuery, AJAX and PHP. Figure 8 and Figure 9 present the screenshots of some of the different modules.
The system has different modules that perform different operations depending on the view. The authentication module provides access to authenticated users before asking/answering questions. The Question and Answer Module deals with questions asked with answers received per question from members. The Post Module Pages allow logged in/authenticated users to post questions using this module. This module has a rich text editor (WYSIWYG) embedded into it.
Figure 7. Normalized Google distance table.
Table 1. One-to-one relationship between login and user information database table.
Table 2. Different types of users of the CQA system.
Table 3. Users privileges in the domain.
Table 4. Typical users with different privileges.
Table 5. Voting relationship among users, questions and answers.
Table 6. Mapping analysis of questions and answers to users.
Figure 8. Answer selection module.
Figure 9. Answer recommendation module.
This module is attached to Category Module Manager that fetches question categories from data source unto this page. Content Filtering Module is also attached to this layer to thoroughly suggest to asker if related questions with answers are available to filtered words or sentences received from input box. The User Manager Module manages all users registered unto this application software, ranging from system administrator to ordinary member. User module is visible to all except quest users. Other registered users can access user’s page and search for members. The Suggestion Module displays related question asked to asker on key press. The question is linked to an answer if clicked on. This module is made available to reduce/eliminate waiting time for an answer to a question. Others are Normalized Google Distance Module with function shown in Figure 7; the Best Answer Recommendation Module, consisting of Answer Selection (Figure 8) and Answer Recommendation (Figure 9).
Normalized Google Distance Table
Figure 7 shows relationship between used words in the CQA system with Normalized Google Distance computation.
Distance uses this with other table to suggest base on content count and word mapping.
4.1. Performance Evaluation
The performance of best answer selection was evaluated using standard metrics: Reciprocal Rank, Mean Reciprocal Rank (MRR) and Discounted cumulative gain (DCG). Also, the data set generated from users’ interaction used in the process of evaluation of the developed system and evaluation criteria to selecting an answer in a CQA system are discussed.
4.1.1. Data Set
Experimental evaluation was carried out on the data set collected within and outside the developed system, data was generated from user’s interaction with the CQA system over time. 400 students of Adekunle Ajasin University, Akungba-Akoko, Nigeria, participated and records were generated with more than 600 unique questions and series of answers mapped to every question asked in real time. The data consist of individual questions and answers generated from the interaction and the votes received from participating member of the community.
4.1.2. Evaluation Metrics
The system evaluation was carried out using three different metrics (Reciprocal Rank (RR), Mean Reciprocal Rank (MRR) and Discounted Cumulative Gain (DCG)) to test the effectiveness of the developed system.
1) Reciprocal Rank (RR) and Mean Reciprocal Rank (MRR)
The mean reciprocal rank is a statistical measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness, while reciprocal rank is the multiplicative inverse of the rank of the first correct answer. The mean reciprocal rank is the average of the reciprocal ranks of results for sample queries of Q.
where refers to the rank position of the first best answer for the i-th query.
Table 7 shows the reciprocal rank of selected answers. The Mean Reciprocal Rank (MRR) of the best answer(s) selected regarding the asker agreement with any selected answers from the pool of answers given is 0.61, which is a fair value for this evaluation.
The MRR of a system can be equal to 1 or less than 1 that is, .
It was observed that the system performs better with limited number of selected answers from highest to lowest rank.
Therefore, MRR is given thus:
Table 8 shows that the performance reduced when more results recommended.
2) Discounted cumulative gain (DCG)
Discounted cumulative gain (DCG) is the measure of ranking quality. In information retrieval, it is often used in measuring the effectiveness of web search engine algorithms or related applications. Using a graded relevance scale of documents in a search engine result set, discounted cumulative gain measures
Table 7. Answer ranking table with questions and user votes.
Table 8. Ideal answer ranking table with questions and user votes.
the usefulness or gain of a document based on its position in the result list. The gain is accumulated from the top of the result list to the bottom with the gain of each result discounted at lower ranks , as shown in Table 9.
where p denotes rank position and reli returns the relevance of vote at position i.
Also, the idealized discounted cumulative gain (IDCG) is used in normalizing the discounted Cumulative gain (DCG). However, IDCG works on the basic assumption that items are ordered by decreasing relevance.
Thus the normalized discounted cumulative gain (nDCG) is given thus:
The relevance score provided across given answers are: 3, 4, 6, 2, 1, 0.
DCG is used to emphasize highly relevant answers appear early in the list.
Table 10 displays the Idealized Discounted Gain.
Table 9. Discounted cumulative gain.
Table 10. Idealized discounted cumulative gain.
To normalize DCG values, an ideal Ordering for the result/query is expected in the order 6, 4, 3, 2, 1, 0.
DCG of ideal ordering or IDCG is given as:
In perfect ranking algorithm, the DCG will be same as the IDCG producing an nDCG of 1.0. All nDCG calculations are relative value on the interval 0.0 to 1.0.
Two assumptions are made in using DCG and its related measures: firstly, highly relevant items are more useful when appearing earlier in a search engine result list (have higher ranks), and secondly, highly relevant items are more useful than marginally relevant items, which are in turn more useful than irrelevant items.
Thus DCG, normalized using IDCG measures the degree with which the items ranked meets the users’ choice, and the higher the value, the better.
The Vote Score and corresponding DCG values are shown in Figure 10.
4.2. Comparison of Result Obtained with Related Works
Table 11 and Figure 11 present a comparison of the MRR result of our proposed system with Question Answering Refinement (QAR) System . We considered final results (MRR) obtained by QAR on dataset used to recommend best answer to asker, also relating the result obtained to our system for effective performance. The result below shows the proposed system performs better than QAR.
Table 11. Result comparison table with related work.
Figure 10. Vote and point distribution.
Figure 11. Comparison of proposed result and related work.
In this work, a web-based Question Answering System that displays and recommends the best answer to the user was developed. The system gives users the privilege to ask questions and also receive answers to questions asked. The work shows different categories of users from Registered Member to Member as a Moderator. Division of users really helped the system to manage integrity of content and information delivery. The system adopted two algorithms: Normalized Google Distance (to suggest relevant answers to user) and Brouwer Fixed Point theorem (to calculate voting score on answers received). Fixed-Point Scoring (FPS) of individual answers was derived from the distribution of votes across answers and the scores of voters who cast the votes to give Answer Score (AS). The results obtained and plotted show that highly ranked answers meet user’s choice and appear earlier in searches. This Question Answering System will help community users to quickly find relevant answers to question without wasting much time. It will always increase in size and supply highly relevant information to members of QA community.
 Anietie, A., Satoshi, S., Mugizi, R. and Mark, D. (2016) Name Variation in Community Question Answering Systems. In Proceedings of the 2nd Workshop on Noisy User-generated Text, Osaka, Japan, 12 December 2016, 51-60.
 Liu, Y.J., Li, S.S., Cao, Y.B., Lin, C.Y., Han, D.Y. and Yu, Y.J. (2008) Understanding and Summarizing Answers in Community-Based Question Answering Services. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK, 18-22 August 2008, 497-504.
 Agichtein, E., Castillo, C., Donato, D., Gionis, A. and Mishne, G. (2008) Finding High-Quality Content in Social Media. Proceedings of the 2008 International Conference on Web Search and Data Mining, Phoenix, February 2008, 183-194.
 Liangjie, H. and Davison, B.D. (2009) A Classification-Based Approach to Question Answering in Discussion Boards. Proceedings of the 32nd Annual Int’l ACM SIGIR Conferences on Research and Development in Information Retrieval, Boston, July 2009, 171-178.
 Michael, H. and Noah, S. (2010) Good Question! Statistical Ranking for Question Generation. Proceedings of Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics, Los Angeles, California, USA, 2-4 June 2010, 609-617.
 Shuo, C. and Aditya, P. (2013) Routing Questions for Collaborative Answering in Community Question Answering. ASONAM 13 Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Ontario, August 2013, 494-501.
 Wu, W., Sun, X. and Wang, H.F. (2018). Question Condensing Networks for Answer Selection in Community Question Answering. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), Melbourne, July 2018, 1746-1755.
 Grunwald, P. and Vitanyi, P. (2003) Kolmogorov Complexity and Information Theory: With an Interpretation in Terms of Questions and Answers. Journal of Logic, Language, and Information, 12, 497-529.
 Bennett, D. (1998) Sense-Making Theory and Practice: An Overview of User Interests in Knowledge Seeking and Use. Journal of Knowledge Management, 2, 36-46.