In recent years, in the context of the era of big data, libraries have creatively applied technologies such as big data and artificial intelligence to explore the potential needs of readers, established a multi-dimensional interaction model between libraries and readers to provide readers with more personalized and humanized services. At present, more and more libraries publish the annual reading report through their official WeChat public accounts or the library homepages. The author takes the “two-class” (the world first-class universities and first-class disciplines, referred to “double-class”) academic libraries announced in China in September 2017 as an example, investigates and analyzes the 2017 library reading annual reports issued by the 42 academic libraries. According to the survey results, there are 29 academic libraries publishing 2017 library reading reports through their official WeChat public accounts or the library homepages, accounting for 69.05% of the total. The annual reading report reflects the library's literature resources utilization and space services in the past year. The library borrows the annual list of books, analyzes the library’s reader data, and reveals various types of data in a simple and vivid way. The reader data shows characteristics of multi-sources, massive, isomerization, and rapid change. By constructing the annual reading report of the academic college library based on personas, the depth of the various reading data of the library is mined, and the collection of labels is used to construct the personas. The model can intuitively understand the readers’ reading tendency and facilitate the librarians to carry out accurate reading promotion services.
Personas are based on massive data, extracting the overall information related to the users, including inherent attributes, such as users’ names, ages, genders, and the users’ reading habits. Such information provides the basis for further analyzing the users’ behavior habits by big data and for more accurate users targeting and personalized services . The concept of personas was first proposed by Alan Cooper (father of interaction design)―“Personas are a concrete representation of target users” . It means, persona is a virtual representation of the real user. There are more researches on personas from foreign libraries than from China, and the foreign application gradually matures. On July 6th 2018, the authors of this article searched for topics in the categories of “user profile” or “Persona” by the Summon Foreign Language Resource Discovery System. The searched subject area was limited to library and information science. A total of 102 articles were detected. A total of 70 articles are in line with the subject after filtering according to the contents. It was found that the research interests of these articles focused mainly on the theories, concepts, algorithms, techniques, personas modeling, practices and applications. For example, Teixeira C, Pinto J S, Martins J A provide a new approach about how user modelling applications can be integrated into any organization without the cost of reengineering the entire information system already in place . Silvia Rossi, François Ferland, Adriana Tapus find that the current literature introduces a general classification scheme for both the profiling and the behavioral adaptation research topics in terms of physical, cognitive, and social interaction viewpoints .
In practical applications, personas first appeared in the field of foreign libraries in the mid-1980s, and were applied in the British National Bibliography and Blaise-line (one of the first online services in Europe) for service optimization . In the American University Library, the annual report of the library is also an important part of library management. The earliest library to release the annual report was the Yale University Library, which has published its annual reports since 1899. The main contents of the American University Library annual report includes: the letter written by the curator to the readers, the mission statements, major achievements, annual hotspots, exhibition activities, financial work, statistics, gifts and donations, etc. .
Through literature research, there are few studies on personas in the domestic library field in China. In terms of theoretical research, personas and labeling systems are widely used in data statistics, data analysis, and big data. With the development of smart libraries and the popularization of artificial intelligence, in the past two years, personas have gradually become a hot issue in the library research field. Through the CNKI database, “user profile” or “Persona” was used as a keyword for searching and 11 related articles were detected. The publication dates have been centered between 2017-2018. The research and application fields mainly focused on the following two modes: construct a complete reader portrait model or user behavior model through data collection and completion; establish a persona (or user behavior model) for library personalized service or library precision service, focusing on how to use the persona or user behavior model to serve the library or reader. In addition, the author also surveyed the 2016-2018 Annual Project of the National Social Science Fund of China (including key projects, youth projects and general projects) issued by the National Office of Philosophy and Social Science Planning. The research topic on “Persona” has only one hit in 2018. The theme of the project is “Research on Library User Portraits of Multi-source Heterogeneous Data Fusion”. Therefore, current research on personas for the annual reading report has not been well investigated in China.
In terms of practical application, through the massive survey of the 2017 reading reports published by the university libraries, it is found that in the “double top-class” university libraries in China, only the reading reports from Peking University Library  and the Sun Yat-sen University Library  revealed the readers’ data through personas. The reading reports found different reading tendencies of different readers through their body physical characteristics and found that there was a positive correlation between font size and reader preference.
Supported by big data technology, artificial intelligence and other technologies, with the goal of intelligent services, the advantages of applying personas to academic library annual reading reports are as follows:
First, reveal the content of the reading report through visual effects. Professor Huanwen Cheng, the director of the library of Sun Yat-sen University, mentioned in a report during the 2018 China Academic Library Development Forum: “Reading promotion is fundamental for the construction of a first-class university.”  Academic college library reading report acts as a beacon for academic college reading. The report can help accurately understanding the reading status of the university staff and students as well as help carrying out reading promotion according to the factual data and actively promoting the construction of a reading atmosphere for the scholarly campus. At present, the reader behavior data and library resource data reflected in the annual reading report of China's libraries are relatively planar. The user images appeared in the annual reading report were mostly reflected in the popular books, authors, etc., which were relatively simple. Further analysis could be from different angles, subdividing user group behavior, fully displaying the library's annual data from user personas, resource personas, etc.
Second, by constructing a persona label system, deep mining and profiling of reading data can be performed. For personas, the higher “pixel” can better reflect the user data. That is to say, the higher the pixel, the easier it is to carry out library accurate services. In the process of constructing the label system of the annual reading report personas, the library should deeply explore the readers’ personal basic data and the readers’ reading behavior data, etc., as well as pay attention to the attachment and association of the readers with the resource data after modeling, and carries out category management, cross-analysis, and iterative analysis for the multi-layer data labels. Only by these means, the library can generate a complete data labeling system to reflect reading report data in a much deeper level.
Third, according to the results of personas in the reading report, it is convenient to carry out personalized, active and personalized reading promotion services. At present, most library reading reports lack the reading-trend analysis of the readers. According to the author's investigation, taking the “double-class” university library as an example, the content of the reading report data displayed by each college is different. The common annual reading reports mainly include the library's admission volume, borrowing volume, resource construction and utilization, space utilization, various types of reader service and reading promotion statistics, book ranking data and various reader star data, WeChat statistics, etc., the data mining of most reading reports stays in revealing the amount of admissions, book borrowing and other content, lack of deep mining of data. The readers’ reading tendency can be deeply analyzed through the construction of various groups of personas. This could help revealing the readers' reading tendency and reading behaviors as well as help analyzing problems emerge during the reading promotion process. Thus, applying personas to academic library annual reading reports can better guide reading, help carrying out targeted reading activity and increase participation rates during the library reading promotion activities.
To introduce personas that are widely used in the commercial field into the annual reading report of the library and present the reading tendency of the readers by visual means, it is necessary to use the technical equipment, such as big data and artificial intelligence, to capture the readers’ reading data and construct a persona model. By these means, the library can accurately understand the readers’ needs for information, provide more accurate reading promotion services and provide different service strategies for readers under different models. The construction of personas model mainly includes data collection, construction of label system and data model, iterative analysis of label system, generation of portraits, etc. The users’ portraits are used to construct the reading tendency of typical groups of colleges. The process is shown in Figure 1.
3.1. Data Collection
The authors conducted an internet survey on the 2017 annual reading report published by Chinese university libraries. From the common annual reading report, data such as the libraries’ admission volumes, borrowing volumes, resource constructions and utilizations, space utilizations, various types of reader service volumes (such as training lectures, subject services, etc.), reading promotion statistics, book ranking data, various reader ranking data and WeChat statistics etc., were collected. Books’ borrowing volume, admissions volume and rankings constitute the main contents of the reading reports of the colleges and universities. According to the readers’ needs from the reading report, combined with the characteristics of the users’ portraits, the relevant data for constructing the user portraits of the library annual reading report is preliminarily compiled (as shown in Figure 2). These data can be accessed through the library integrated management system and access control. The system and the library business statistics platform and other multi-channel acquisitions, according to the various types of data required, in the corresponding statistical platform to obtain data.
3.2. Behavior Modeling
User image data can generally be grouped into two dimensions, namely static image data and dynamic image data. In the static portrait data of the annual reading report of the library, one is resource metadata (including resource attribute information) and related data of resource utilization, such as popular books, publishers, authors, borrowing methods, etc.; second is, reader attribute
Figure 1. Construction flowchart of the user portrait of the library reading report.
Figure 2. Constructing the user portraits of the annual reading report of the library.
data, such as name, gender, age, etc. The dynamic portrait data of the annual reading report of the library is the readers’ behavior data, such as the amount of borrowing, the amount of admission, the amount of WeChat subscription, the number of site hits, and the time period of the reader's use of the library, as shown in Figure 3.
Through the deep excavation of the library reader data, extraction of the static portrait data and dynamic portrait data, construction of models by clustering algorithm, filtering algorithm, feedback algorithm and weighted algorithm etc., libraries can conduct iterative analysis of the label system. By further data filtering, elimination of redundant data and analysis of user tags, the user behavioral modeling can be completed.
3.3. Build Persona
The reader persona in the annual reading report of the library is a persona of the reader group. It is necessary to analyze the reader group model through multi-dimensional analysis, and according to the reader behavior data, all readers are grouped by associated clustering to form different types of reader group personas or resource personas. The persona of a reader group was subdivided and was used as an example as in Figure 4.
After completing the model building of the reader group, the personas are constructed according to the weights of the reader behavior tags. The construction, improvement and application of personas are inseparable from the support of algorithms and technologies, such as classification algorithms, filtering algorithms, feedback algorithms, weighting algorithms, etc. . The common algorithm in the classification of persona weights is the TF-IDF algorithm, which is a
Figure 3. Reader behavior modeling in the library annual report.
statistical method used to evaluate the importance of a word to a file set or one of the files in a corpus. The importance of the word increases proportionally with the number of times it appears in the file, but at the same time it decreases inversely with the frequency it appears in the corpus . Here we use to indicate the number of times a tag T is used to mark the user P, and to indicate the proportion of the tag number in all tags of the user P, as follows:
Note: Ti∈ all tags of the reader or resource.
The corresponding indicates the scarcity of the tag T in all tags, that is, the probability of occurrence of this tag. The formula is as follows:
Note: Pj∈ all tags of the reader or resource.
Then, according to the TF * IDF, the weight value of the reader group label can be obtained. Finally, a persona of a specific group can be outlined, and the associated data is presented in the form of a persona, as shown in Figure 5.
Figure 4. Reader group subdivision model.
Figure 5. Library reading report: user portrait label.
3.4. Visual Presentation and Precise Service
At present, the personas embodied in the annual reading report of the library are mainly to reveal the readers’ reading behaviors, such as the persona model presented through deep analysis and visualization, which is beneficial to the library to carry out the reading promotion and wisdom recommendation services for a certain group of readers. Specifically, the user group image results of the reader group can be subdivided, and the active readers, ordinary readers and potential readers can be deeply explored. The reading behavior and reading tendency of the specific reader group can be analyzed, and existing problems in the reading promotion process can be found. Based on the precise description and description of the individual or group of users, combined with knowledge discovery technologies, the library can establish multiple associations among users with similar knowledge needs, interest preferences, usage habits and activity levels, forming a portrait-based analysis. User relationship maps to reveal deep values of value . As shown in Figure 6, based on various types of reading data of various groups of readers, knowledge map technology can be used to analyze the results based on the formed portraits to establish knowledge associations and interactions among users, and to find similar information needs of various reader groups and potential needs to reveal deep reading tendencies, so that libraries
Figure 6. Knowledge relevance in the user portraits of the library reading reports.
can carry out various reading promotion services more accurately.
The study of the annual reading report of the library has guiding significance for many of the work of the library, because only by grasping the objective information needs of the readers can the document service work be done in a targeted manner. With the advent of the era of big data, the National 13th Five-Year Plan outlines the implementation of the National Big Data Strategy. Personas, as one of the tools for achieving accurate services in the era of big data, can help libraries to judge, analyze, and predict readers' information, demands, and mine the potential information needs of readers, so that the library can provide the readers with more personalized, proactive and humanized intelligent services. This paper puts forward the introduction of personas into the reading report of the university libraries, and draws on the development experience of computer portraits in the field of computers, e-commerce and foreign libraries, and puts forward the design ideas of annual reading reports based on user portraits in the Chinese university libraries. In the field of library in China, it provides constructive suggestions for the mining, integration and classification of library data from the perspective of personas.
Guangdong Provincial Library Association Project (GDTK1809): Research on the annual reading report of academic libraries based on personas.