With more and more data available on the web, how users search and discover contents is of crucial importance. There is growing research on interaction paradigms investigating how users may benefit from the expressive power of semantic web standards.
The semantic web may be defined as the transformation of the worldwide web to a database of linked resources, where data may be widely reused and shared  . Web services can be enhanced by drawing on semantically aware data made available by a variety of providers. In addition, as information discovery needs to become more and more challenging, traditional keyword-based information retrieval methods are increasingly falling short in providing adequate support. This retrieval problem is compounded by the poor quality of the metadata content in some digital collections.
SECO  -  is defined as the interaction of a set of actors on top of a common technological platform providing a number of software solutions or services   . In SECO, internal and external actors create and compose relevant solutions together with a community of domain experts and users to satisfy customer needs within specific market segments. This poses new challenges since the software systems providing the technical basis of a SECO are being evolved by various distributed development teams, communities and technologies.
There is growing agreement for the general characteristics of SECO, including a common technological platform enabling outside contributions, variability-enabled architectures, tool support for product derivation, as well as development processes and business models involving internal and external actors. At least ten SECO characteristics have been identified  that focus on technical processes for development and evolution, see Table 1.
Table 1. SECO characteristics  .
Gawer and Cusumano  have analyzed a wide range of industry examples of SECO and identified two predominant types of platforms:
1. Internal platforms (company or product): defined as a set of assets organized in a common structure from which a company can efficiently develop and produce a stream of derivative products.
2. External platforms (industry): defined as products, services, or technologies that act as a foundation upon which external innovators, organized as an innovative business ecosystem, can develop their own complementary products, technologies, or services.
Indeed, the new generation of SECO must be an integration of multi-plat- forms (internal and external) that allows the interaction of a set of internal and external actors.
Concurrently modern software demands more and more adaptive features, many of which must be performed dynamically. In this context, a collaborative platform is important in order to coordinate collaborative and distributed environments for development of SECO platforms.
Furthermore, as the requirement of SECO to support adaptation capabilities of systems is increasing in importance  it is recommended such adaptive features be included within software product lines (SPL)     . The SPL concept is appealing to organizations dealing with software development that aims to provide a comprehensive model for an organization building applications based on a common architecture and core assets   .
SPLs have been used successfully in industry for building families of systems of related products, maximizing reuse, and exploiting their variable and configurable options  .
SPL development can be divided into three interrelated activities:
1. Core assets development: may include architecture, reusable software components, domain models, requirement statements, documentation, schedules, budgets, test plans, test cases, process descriptions, modeling diagrams, and other relevant items used for product development.
2. Product development: represents activities where products are physically developed from core assets, based on the production plan, in order to satisfy the requirements of the SPL  .
3. Management: involves the essential processes carried out at technical and organizational levels to support the SPL process and ensures that the necessary resources are available and well coordinated.
To develop and implement SPL the literature proposes several SPL frameworks  using a variety of CBSD approaches    :
1. COPA (component-oriented platform architecting): an SPL framework that is component-oriented.
2. FAST (family-oriented abstraction, specification and translation): a software development process that divides the process of a product line into three sections: domain qualification, domain engineering and application engineering.
3. FORM (feature-oriented reuse method): a feature-oriented method that, by analyzing the features of the domain, uses these features to provide the SPL architecture. FORM focuses on capturing commonalities and differences of applications in a domain in terms of features and uses the analysis results to develop domain architectures and components.
4. Kobra: a component-oriented approach based on the UML features that integrate the two paradigms into a semantic, unified approach to software development and evolution.
5. QADA (quality-driven architecture design and analysis): a product line architecture design method that provides traceability between the product quality and design time quality assessment.
Semantic web      linked data is the most important concept to support Semantic Metadata Enrichment (SME) in a SECO architecture  -  .
Today, semantic web technologies, for example in digital libraries, offer a new level of flexibility, interoperability and a way to enhance peer communication and knowledge sharing by expanding the usefulness of the digital libraries that in the future will contain the majority of data. Indeed, a semantic web engine, based on semantic web technology, ensures more closely relevant results based on the ability to understand the definition and user-specific meaning of the word or term being searched for. Semantic search of semantic web engines are better able to understand the context in which the words are being used, resulting in relevant results with greater user satisfaction. Unfortunately, in the public domain there is a scarcity of search engines that follow a semantic-based approach to searching and browsing data  . Furthermore, the web is currently not contextually organized.
Thus, to enrich web data by transforming it into knowledge accessible by users, we propose a multi-platform architecture, referred to as SMESE, which uses a CBSD approach to integrate distributed content management enterprise applications, such as libraries and the Software Product Line Engineering (SPLE) approach.
Our SMESE architecture includes mobile first design (MFD) and semantic metadata enrichment (SME) engines that consist of metadata and meta-entity enrichment based on mapping ontologies and a semantic master metadata catalogue (SMMC).
More specifically, our SMESE implements a new decision support process in the context of SPLE, called the SPLE decision support process (SPLE-DSP), a meta entity model that represents all library materials and a meta metadata model. SPLE-DSP allows support for metadata-based reconfiguration. It consists of a dynamic and optimized metadata based reconfiguration model (DOMRM) where users select their preferences in the market place.
The major contributions of this paper are:
1. Definition of a software ecosystem model that configures the application production process including software aspects based on a proposed CBSD and metadata-based SPLE approach.
2. Definition and partial implementation of semantic metadata enrichment using SPLE and a semantic master metadata catalogue (SMMC) to create a universal metadata knowledge gateway (UMKG).
3. Design and implementation of a SMESE prototype for a semantic digital library (Libër).
This paper proposes a semantic metadata enrichment software ecosystem (SMESE) to support multi-platform metadata driven applications, such as a semantic digital library. Based on mapping ontologies SMESE also integrates and enriches data and metadata to create a semantic master metadata catalogue (SMMC).
The remainder of the paper is organized as follows. Section 2 is a literature review. Section 3 presents the multi-platform architecture of the proposed SMESE, and Section 4, the related nine sub-systems. Section 5 presents the prototype of a SMESE implementation in an industry context. Section 6 presents a summary and ideas for future work.
2. Literature Review
A software product line (SPL)  -    is a set of software intensive systems that share a common and managed set of features satisfying the specific needs of a particular market segment developed from a common set of core assets in a prescribed way   . SPL engineering aims at: effective utilization of software assets, reducing the time required to deliver a product, improving quality, and decreasing the cost of software products.
The following sub-sections present the four research axes related to our research:
1. Software product line engineering (SPLE).
2. SECO architecture using component integration and component evolution.
3. SECO architecture and SPLE.
4. Semantic metadata enrichment (SME).
The related works section is at the intersection of SPLE, service-oriented computing, cloud computing, semantic metadata and adaptive systems.
2.1. Software Product Line Engineering (SPLE)
The development of software involves requirements analysis, design, construction, testing, configuration management, quality assurance and more, where stakeholders always look for high productivity, low cost and low maintenance. This has led to software product line engineering (SPLE)  as a comprehensive model that helps software providers to build applications for organizations/ clients based on a common architecture and core assets. SPLE deals with the assembly of products from current core assets, commonly known as components, within a component-based architecture   , and involves the continuous growth of the core assets as production proceeds.
Note that the following related works are organized according to two axes: organizational and technical.
An overview of SPLE challenges is presented in    . Metzger and Pohl  suggest that the successful introduction of SPLE heavily depends on the implementation of adequate organizational structures and processes. They also identify three trends expected from SPLE research in the next decade:
1. Managing variability in non-product-line settings.
2. Leveraging instantaneous feedback from big data and cloud computing during SPLE.
3. Addressing the open world assumption in software product line settings.
A survey of works on search based software engineering (SBSE) for SPLE is presented in Harman et al.   .
Capilla et al.  provide an overview of the state of the art of dynamic software product line architectures and identify current techniques that attempt to tackle some of the many challenges of runtime variability mechanisms. They also provide an integrated view of the challenges and solutions that are necessary to support runtime variability mechanisms in SPLE models and software architectures. According to them, the limitations of today’s SPLE models are related to their inability to change the structural variability at runtime, provide the dynamic selection of variants, or handle the activation and deactivation of system features dynamically and/or autonomously. SPLE is, therefore, the natural candidate within which to address these problems. Since it is impossible to predict all the expected variability in a product line, SPLE must be able to produce adaptable software where runtime variations can be managed in a controlled manner. Also, to ensure performance in systems that have strong real-time requirements, SPLE must be able to handle the necessary adaptations and current reconfiguration tasks after the original deployment due to the computational complexity during variants selection.
Olyai and Rezaei  describe the issues and challenges surrounding SPLs, introduce some SPLE ecosystems and compare them, based on the issues and challenges, with a view to how each ecosystem might be improved. The issues and challenges are presented in terms of administrative and organizational aspects and technical aspects. The administrative and organizational comparison criteria include strategic plans of the organization while the technical comparison criteria include requirements, design, implementation, test and maintenance. According to them, there is not a single approach that takes into account all these criteria together. Also, no single approach takes into account metadata for implementation and testing.
2.2. SECO Architecture Using Components Integration and Components Evolution
Software ecosystems (SECO)        consist of multiple software projects, often interrelated to each other by means of dependency relationships. When one project undergoes changes and issues a new release, this may or may not lead other projects to upgrade their dependencies. Unfortunately, the upgrade of a component may create a series of issues. In their systematic literature review of SECO research, Manikas and Hansen  report that while research on SECO is increasing:
1. There is little consensus on what constitutes a SECO.
2. Few analytical models of SECO exist.
3. Little research is done in the context of real-world SECO.
They define a SECO as the interaction of a set of actors on top of a common technological platform that results in a number of software solutions or services where each actor is motivated by a set of interests or business models while connected to the rest of the actors. They also identify three main components of SECO architecture:
1. SECO software engineering: focuses on technical issues related directly or indirectly to the technological platform.
2. SECO business and management: focuses on the business, organizational and management aspects.
3. SECO relationships: represent the social aspect of the architecture since it is essential for SPLE actors to interact among themselves and with the platform.
2.3. SECO Architecture and SPLE
This section focuses on SECO architecture related to SPLE, beginning with an industry perspective.
Christensen et al.  define the concept of SECO architecture as a set of structures comprised of actors and software elements, the relationships among them, and their properties. They present the Danish telemedicine SECO in terms of this concept, and discuss challenges that are relevant in areas beyond telemedicine. They also discuss how software engineering practice is affected by describing the creation and evolution of a central SECO architecture, namely Net4Care, that serves as a reference architecture and learning vehicle for telemedicine and for the actors within a single software organization.
Demir  also proposes a software architecture that is strongly related to a defence system and limited to military personnel. Their multi-view SECO architecture design is described step by step. They begin by identifying the system context, requirements, constraints, and quality expectations, but do not describe the end products of the SECO architecture. They also introduce a novel architectural style, called “star-controller architectural style”  where synchronization and control of the flow of information are handled by controllers. However, a major drawback of this style is that failure of one controller disables all the subcomponents attached to that controller.
Neves et al.  propose an architectural solution based on ontology and the spreading algorithm that offers personalized and contextualized event recommendations in the university domain. They use an ontology to define the domain knowledge model and the spreading activation algorithm to learn user patterns through discovery of user interests. The main limitation of their architectural context-aware recommender system is that it is specific to university populations and does not present the actual model of the system that shows the interactions between the components and the data.
Alferez et al.  propose a framework that uses semantically rich variability models at runtime to support the dynamic adaptation of service compositions. They argue that should problematic events occur, functional pieces may be added, removed, replaced, split or merged from a service composition at runtime, hence delivering a new service composition configuration. Based on this argument, they propose that service compositions be abstracted as a set of features in a variability model. They define a feature as a logical unit of behavior specified by a set of functional and non-functional requirements. Thus, they propose adaptation policies that describe the dynamic adaptation of a service composition in terms of the activation or deactivation of features in the causally connected variability model. Unfortunately, this variability model is limited to activation and deactivation of services. Indeed, the model should allow adaptation of services or include a service interoperability protocol (SIP) rather than compositions only according to changes in the computing infrastructure.
In component based software development (CBSD), the fuzzy logic approach   is largely used to select components. Singh et al.  explored the various measures such as separation of concerns (SoC), coupling, cohesion, and size measure that affect the reusability of aspect oriented software. The main drawback of their contribution is that the fuzzy logic rules are static. They do not propose a way to improve the rules based on developer satisfaction of the fuzzy inference system (FIS) output. In addition, their fuzzy inference system is limited to reusability of software.
2.4. Semantic Metadata Enrichment (SME)
Bontcheva et al.  investigate semantic metadata automatic enrichment and search methods. In particular, the benefits of enriching articles with knowledge from linked open data resources are investigated with a focus on the environmental science domain. They also propose a form-based semantic search interface to facilitate environmental science researchers in carrying out better semantic searches. Their proposed model is limited to linking terms with DBpedia URI and does not take into account the semantic meaning of terms in order to detect the best DBpedia URI.
Some authors focus their enrichment model on person mobility trace data     . Krueger et al.  show how semantic insights can be gained by enriching trajectory data with place of interest (POI) information using social media services. They handle semantic uncertainties in time and space, which result from noisy, imprecise, and missing data, by introducing a POI decision model in combination with highly interactive visualizations. However, this model is limited to POI detection.
Kunze and Hecht  propose an approach to processing semantic information from user-generated OpenStreetMap (OSM) data that specifies non-resi- dential use in residential buildings based on OSM attributes, so-called tags, which are used to define the extent of non-residential use.
Our conclusions from these related works are:
1. SPLE architecture needs to be flexible and meet administrative and organizational aspects such as the organization’s strategic plans and marketing strategies, as well as technical aspects such as requirements, design, implementation, test and maintenance.
2. Researchers need to focus on real-world SECO.
3. Several proposed SECO models do not take into account autonomic mechanisms to guide the self-adaptation of service compositions according to changes in the computing infrastructure.
4. In CBSD fuzzy inference systems (FIS) have been employed to develop the components selection model, however, there is no FIS based model that proposes more than one software measure as FIS output.
5. There is no SECO architecture that takes into account several semantic enrichment aspects.
6. Current metadata and entity enrichment models are limited to only one domain for their semantic enrichment process and therefore do not involve several enriched metadata and entity models.
7. Current metadata and entity enrichment models only link terms and DBpedia URI.
8. Current metadata and entity enrichment models do not take into account person mobility trace data gathering and analysis in the enrichment process of metadata.
3. SMESE Multi-Platform Architecture
This section presents the proposed semantic enriched metadata software ecosystem (SMESE) architecture based on SPLE and CBSD approaches to support metadata and entity social and semantic enrichment for semantic digital libraries and based on an MFD approach for user interface design. Each component of the SMESE architecture is based on existing approaches (SPLE and CBSD) and an SME concept (proposed in this work) to generate, extract, discover and enrich metadata based on mapping ontologies and making use of contents and linked data analysis.
For the new generation of information and data management, metadata is a most efficient material for data aggregation. For example, it is easier to find a specific set of interests for users based on metadata such as content topics, or based on the sentiments expressed in a content. Furthermore, it is possible to increase user satisfaction by reducing the user interest gap. To make this feasible, all content needs to be enriched. In other words, specific metadata must be available including semantic topics, sentiments and abstracts. However, at the present time more than 85% of content does not have this metadata.
The SMESE multiplatform prototype includes an engine to aggregate multiple world catalogues from libraries, universities, Bbookstores, #tag collections, museums, and cities. The collection of pre-harvested and processed metadata and full text comprises the searchable content.
Central indexes typically include: full text and citations from publishers, full text and metadata from open source collections, full text, abstracting, and indexing from aggregators and subscription databases, and different formats (such as MARC) from library catalogues, also called the base index, unified index, or foundation index.
The SMESE multiplatform framework must link bibliographic records and semantic metadata enrichments into a digital world library catalogue. SMESE must search and discover actual collections or novelties, including: works, books, DVDs, CDs, comics, games, pictures, videos peoples, legacy collections, organizations, rewards, TVs, radios, and museums.
The five levels of the semantic collaborative gateway are:
1. Meta Entity.
3. Semantic metadata enrichment and creation.
4. Free sources of metadata and subscription-based metadata.
Figure 1 presents the entity matrix. The metadata are defined once and are related to each specific entity.
Semantic relationships between the contents, persons, organization and places are defined and curated in the master metadata catalogue. Topics, sentiments and emotions must be extracted automatically from the contents and their context:
Figure 1. Entity matrix.
1. Libraries spend a lot of money buying books and electronic resources. Enrichment uncovers that information and makes it possible for people to discover the great resources available everywhere.
2. The average library has hundreds of thousands of catalogue records waiting to be transformed into linked data, turning those thousands of records into millions of relationships.
FRBR (functional requirements for bibliographic records) is a semantic representation of the bibliographic record. A work is a high-level description of a document, containing information such as author (person), title, descriptions, subjects, etc., common to all expressions, format and copy of the work (see Figure 2 for an FRBR framework description).
SMESE must allow users to find topically related content through an interest- based search and discovery engine. Transforming bibliographic records into semantic data is a complex problem that includes interpreting and transforming the information. Fortunately, many international organizations (e.g., BNF, Library of Congress and some others) have partly done this heavy work and already have much bibliographic metadata converted into triple-stores.
Recent catalogues support the ability to publish and search collections of descriptive entities (described by a list of generic metadata) for data, content, and related information objects. Metadata in catalogues represent resource characteristics that can be indexed, queried and displayed by both humans and software. Catalogue metadata are required to support the discovery and notification of information within an information community. Using the information from these Semantic Metadata Enrichments, the search engine, discovery engine and notification engine are able to give to the final user better results in accord with his interest or mood.
Figure 2. FRBR framework description.
SMESE must also include an automated approach for semantic metadata enrichment (SME) that allows users to perform interest-based semantic search or discovery more efficiently. To summarize, our SMESE makes the following contributions:
Definition and development of a proposed semantic metadata enrichment software ecosystem (see Figure 3 for SMESE overview and Appendix B shows the detailed version).
This new semantic ecosystem will harvest and enrich bibliographic records externally (from the web) and internally (from text data). The main components of the ecosystem will be:
1. Metadata initiatives & concordance rules
2. Harvesting web metadata & data
3. Harvesting authority metadata & data
4. Rule-based semantic metadata external enrichment engine
5. Rule-based semantic metadata internal enrichment engine
6. Semantic metadata external & internal enrichment synchronization engine
7. User interest-based gateway
8. Semantic master catalogue
A. Topic detection/generation: A prototype was developed to automate the generation of topics from the text of a document using our algorithm BM-SATD (Semantic Annotation-based Topic Detection). In this research prototype, the following issues were investigated:
1. Semantic annotations can improve the processing time and comprehension of the document.
Figure 3. Semantic Enriched Metadata Software Ecosystem (SMESE) architecture.
2. Extending topic modeling into account co-occurrence to combine semantic relations and co-occurrence relations to complement each other.
3. Since latent co-occurrence relations between two terms cannot be measured in an isolated term-term view, the context of the term must be taken into account.
4. Use of machine learning techniques to allow the ecosystem SMESE to be able to find a new topic itself.
B. Sentiment/Emotion Analysis: The prototype developed has the following characteristics:
1. Traditional sentiment analysis methods mainly use terms and their frequency, parts of speech, rules of opinion and sentiment shifters; but semantic information is ignored in term selection.
2. Our contribution to sentiment analysis includes emotions.
3. The human contribution to improve the accuracy of our approach is taken into account.
4. Sentiment and emotion analysis are combined.
5. It is important to identify the sentiment and emotion of a book taking into account all the books of the collection.
6. The collection of documents and paragraphs are taken into account. In terms of granularity, most of the existing approaches are sentence-based.
7. These approaches did not take into account the surrounding context of the sentence which may cause some misunderstanding with discovery of sentiment/emotion. In our approach, the surrounding context of the sentence is included.
The prototype makes use of the proposed algorithm BM-SSEA (Semantic Sentiment and Emotion Analysis). The SMEE algorithm fulfills all the attributes of Table 2.
Table 2. SMESE characteristics.
More specifically, the proposed SPLE approach is a combination of FORM and COPA approaches focusing on data and metadata enrichment. Through the combination of these two approaches, the following can be taken into account:
1. Administrative and organizational aspects such as roles and responsibilities, intergroup communication capabilities, personnel training, adoption of new technologies, strategic plans of the organization and marketing strategies.
2. Technical aspects such as requirements, design, implementation, test and maintenance.
With respect to CBSE, our SMESE includes a method for selecting composer components for design of an SPLE. This method can manage and control the complexities of the component selection problem in the creation of the declared product line. Also, the SMESE architecture supports runtime variability and multiple and dynamic binding times of products.
4. Subsystems within the SMESE Multi-Platform Architecture
The following sub-sections present in more detail the nine subsystems designed for the prototype of this SMESE architecture.
4.1. Metadata Initiatives & Concordance Rules
This section presents the details of the metadata initiatives & concordance rules, specifically the semantic metadata meta-catalogue (SMMC) as shown in Figure 2.
Metadata is structured information that describes, explains, locates, accesses, retrieves, uses, or manages an information resource of any kind. Metadata refers to data about data. Some use it to refer to machine understandable information, while others employ it only for records that describe electronic resources. In the library ecosystem, metadata is commonly used for any formal scheme of resource description, applying to any type of object, digital or non-digital. Many metadata schemes exist to describe various types of textual and non-textual objects including published books, electronic documents, archival documents, art objects, educational and training materials, scientific datasets and, obviously, the web.
Libraries and information centers are the intermediaries between the information, information sources and users. In order to make information accessible, libraries perform several activities, one of the most important and fundamental of which is cataloguing. The technological developments of the past 25 years have radically transformed both the process of cataloguing and access to information through catalogues.
Several rules have been proposed to cover the description and provision of access points for all library materials (entities). These rules are based on an individual framework for the description of library materials. There is no ecosystem that allows the creation of universal, understandable and readable, metadata, that would describe all entities used in a library.
The most known metadata models are:
1. Dublin Core (DC): primarily designed to provide a simple resource description format for networked resources. DC does not have any coding to provide the necessary details for the specification of a record that could be converted to any machine readable coding like UNIMARC, MARC21.
2. UNIMARC: consists of data formulated by highly controlled cataloguing codes. This format is difficult to understand and unreadable for the end user. For this reason, MARC21 was proposed.
3. MARC21: is both flexible and extensible and allows users to work with data in ways specific to individual library needs. MARC21 remains difficult to understand, however.
4. RDF/RDA: mainly in Europe, is a new model that includes FRBRized Bibliographic Records.
5. BIBFRAME: mainly in North America, is a new model that includes FRBRized Bibliographic Records.
In addition, there is no mapping model among these that would make them interoperable. The overall challenge is to develop: (1) a modeling of partial international standardization of entities, (2) a modeling of partial international standardization of metadata, and (3) a modeling of partial international standardization of metadata mapping ontology.
Unfortunately, the power of metadata is limited: indeed, large national and international digital library projects, such as Europeana and the Digital Public Library of America, have highlighted the importance of sharing metadata across silos. While both of these projects have been successful in harvesting collections data, they have had problems with rationalizing the data and forming a coherent and semantic understanding of the aggregation.
In addition, organizations create digital collections and generate metadata in repository silos. Generally such metadata does not:
1. Connect the digitized items to their analogue sources.
2. Connect names to authority records (persons, organizations, places, etc.) nor subject descriptions to controlled vocabularies.
3. Connect to related online items accessible elsewhere.
Aggregators harvest this metadata that, in the process, generally becomes inaccurate. In fact, aggregators usually ignore idiosyncratic use of metadata schemas and enforce the use of designated metadata fields.
Connecting data across silos would help improve the ability of users to browse and navigate related entities without having to do multiple searches in multiple portals. The proposed model defines crosswalks that create pathways to different sources; each pathway checks the structure of the metadata source and then performs data harvesting. Figure 4 shows the SMMC model that addresses this issue.
In SMESE the metadata is classified into six categories:
Figure 4. Semantic metadata meta-catalogue (SMMC).
1. Descriptive metadata: describes and identifies information resources at the local (system) level to enable searching and retrieving (e.g., searching an image collection to find paintings of animals) at the web-level, and to enable users to discover resources (e.g., searching the web to find digitized collections of poetry). Such metadata includes unique identifiers, physical attributes (media, dimensions, conditions) and bibliographic attributes (title, author/creator, language, keywords).
2. Structural metadata: facilitates navigation and presentation of electronic resources and provides information about the internal structure of resources (including page, section, chapter numbering, indexes, and table of contents) in order to describe relationships among materials (e.g., photograph B was included in manuscript A), and to bind the related files and scripts (e.g., File A is the JPEG format of the archival image File B).
3. Administrative metadata: facilitates both short-term and long-term management and processing of digital collections and includes technical data on creation and quality control, rights management, access control and usage requirements.
4. Dimension, longevity and identification metadata: are new classifications that aim to increase user satisfaction, in terms of expected interests and emotions. For example, dimension metadata regroups all metadata about space, time, emotions and interests. This metadata allows finding specific content. Another example: emotions may suggest specific content to a particular user at a specific time and place. Furthermore, the source metadata identifies the pro- venance and the rights relative to the creation of the metadata.
4.2. Harvesting of Web Metadata & Data
The harvesting of web metadata & data sources such as:
1. Semantic digital resources
2. Digital resources
3. Portal/websites events
4. Social networks & events
5. Enrichment repositories
6. Discovery repositories
The integration of these sources in SMESE allows users to aggregate and enrich metadata and data.
4.3. Harvesting Authority Metadata & Data
This sub-section presents the details of the Harvesting of Authorities Metadata & Data.
The Semantic Multi-Platform Ecosystem consists of many authority sources, such as:
1. BAnQ (Bibliothèque et Archives nationales du Qc
2. BAC (Bibliothèque et Archives du Canada
3. BNF (Bibliothèque Nationale de France)
4. Library of Congress
5. British Library
7. Spanish Library
The integration of these platforms in SMESE allows users to build an integrated authorities knowledge base.
4.4. Rules-Based Semantic Metadata External Enrichments Engine
This sub-section presents the details of the rule-based semantic metadata external enrichment engine.
Semantic searches over documents and other content types needs to use semantic metadata enrichment (SME) to find information based not just on the presence of words, but also on their meaning. It consists of:
1. Rule-based semantic metadata external enrichment engine.
2. Multilingual normalization.
3. Rule-based data conversion.
4. Harvesting metadata & data.
Linked open data (LOD) based semantic annotation methods are good candidates to enrich the content with disambiguated domain terms and entities (e.g. events, emotions, interests, locations, organizations, persons), see Figure 5, described through Unique Resource Identifiers (URIs)  . In addition, the original contents should be enriched with relevant knowledge from the respective
Figure 5. Linked Open Data (LOD).
LOD resources (e.g. that Justin Trudeau is a Canadian politician). This is needed to answer queries that require common-sense knowledge, which is often not present in the original content. For example: following semantic enrichment, a semantic search for events that provides specific emotions in Montreal according to individual interests this weekend would indeed provide relevant metadata about events in Montreal, even though not explicitly mentioned in the original content metadata.
The semantic annotation process of SMESE creates relationships between semantic models, such as ontologies and persons. It may be characterized as the semantic enrichment of unstructured and semi-structured contents with new knowledge and linking these to relevant domain ontologies/knowledge bases. It typically requires annotating a potentially ambiguous entity mention (e.g. Justin Trudeau) with the canonical identifier of the correct unique entity (e.g. depending on the content, http://dbpedia.org/page/Justin_Trudeau). The benefit of social semantic enrichment is that by surfacing annotated terms derived from the full-text content, concepts buried within the body of the paper/report can be highlighted. Also, the addition of terms affects the relevance ranking in full-text searches. Moreover, users can be more specific by limiting the search criteria to the subject or interest or emotion metadata (e.g. through faceted search).
4.5. Rule-Based Semantic Metadata Internal Enrichments Engine
This sub-section presents the details of the rule-based semantic metadata internal enrichment engine including software product line engineering (SPLE).
This sub-system includes:
1. A rule-based semantic metadata internal enrichment engine.
2. A multilingual normalization process.
3. Software Product Line Engineering (SPLE)
4. A topic, sentiment/emotion, abstract analysis and an automatic literature review.
These processes extract, analyze and catalogue metadata for topics and emotions involved in the SMESE ecosystem. These enrichment processes are based on information retrieval and knowledge extraction approaches. The text is analyzed making use of extension of text mining algorithms such as latent Dirichlet allocation (LDA), latent semantic analysis (LSA), support vector machine (SVM) and k-Means.
The different phases of the enrichment process by topics are:
1. Relevant and less similar documents selection phase.
2. Not annotated documents semantic term graph generation phase.
3. Topics detection phase.
4. Training phase.
5. Topics refining phase.
The different phases of the enrichment process by sentiments and emotions are:
1. Sentiment and emotion lexicon generation phase.
2. Sentiment and emotion discovery phase.
3. Sentiment and emotion refining phase.
One of the contributions of the SMESE for digital libraries is that it is not specific to one software product but can be applied to many products dynamically. In addition, it includes a semantic metadata enrichment (SME) process to improve the quality of search and discovery engines.
Indeed, our goal is to provide a SECO that offers a new way to share and learn knowledge. In practice, with the emergence of Big Data, knowledge is not easy to find at the right time and place. The proposed ecosystem uses an SPLE architecture that is a combination of FORM and COPA approaches to catalogue semantically different contents.
Furthermore, we introduce an SPLE decision support process (SPLE-DSP) in order to meet the SPLE characterization such as:
1. Runtime variability functionalities support.
2. Multiple and dynamic binding.
3. Context-awareness and self-adaptation.
SPLE-DSP supports the activation and deactivation of features and changes in the structural variability at runtime and takes into account automatic runtime reconfiguration according to different scenarios. In addition, SPLE-DSP rebinds to new services dynamically based on the description of the relationships and transitions between multiple binding times under an SPLE when the software adapts its system properties to a new context. To take into account context variability to model context-aware properties, SPLE-DSP makes use of an autonomous robot that exploits context information to adapt software behavior to varying conditions.
Furthermore, SPLE-DSP integrates the adaptation of assets and products dynamically. This helps products to evolve autonomously when the environment changes and provides self-adaptive and optimized reconfiguration. Additionally, SPLE-DSP exploits knowledge and context profiling as a learning capability for autonomic product evolution by enhancing self-adaptation.
The SPLE-DSP model is an optimized metadata based reconfiguration model where users select their preferences in terms of configuration of interests.
The dynamic and optimized metadata-based reconfiguration model (DOMRM) takes into account the preferences of several users who have distinct requirements in terms of desirable features and measurable criteria. For example:
1. In terms of hardware criteria, the user can select preferences in terms of memory and power consumption or feature attributes such as internet bandwidth or screen resolution.
2. In terms of software criteria, the user can select the entities and their properties, the property characteristics such as the displaying mode, and expected value type.
Indeed, when user preferences change at runtime, the system must be reconfigured to satisfy as many preferences as possible. Since user preferences may be contradictory, only some will be partially satisfied and a relevant algorithm needed to compute the most suitable reconfiguration. To overcome this drawback, we developed the use of a new metadata-based feature model, referred to as the BiblioMondo semantic feature model (BMSFM), to represent user preferences in terms of semantic features and attributes. Our BMSFM constitutes an evolution of traditional stateful feature models  that includes the set of user metadata based configurations in the model itself, which allows the representation of user decisions with attributes and cardinalities. More specifically, we developed a metadata-based reconfiguration model that defines all possible metadata and all possible entities that users may need in a specific domain. When a user needs new metadata, he uses the metadata-based request creation tool. The DOMRM model analyses the request and checks whether the requested metadata is relevant and does not already exist. Thus when needed the model automatically creates the new metadata and reconfigures the ecosystem which then becomes available for all users.
Figure 6 illustrates the DOMRM model we designed that is an optimized metadata based configuration for multiple users.
Figure 6. Optimized metadata based configuration for multiple users―DOMRM model.
When the user chooses preferences in terms of system behavior, the semantic weight of each feature is computed based on the feature configuration model (FCM). FCM represents the semantic relationship between features where each feature is active or not. In addition, FCM defines the rules that control the activation status of each feature according to its links with the other features. For example, a rule may be: feature Fi should never be activated when Fi-1 is activated. Based on this rule, the model automatically activates or deactivates the feature.
The rules are also used to predict the behavior of the application based on the activation status of features according to user preferences. Notice that each user has his own weight per feature that is defined based on his use of the feature. This weight quantifies the importance of the feature for the user (more details about the DOMRM algorithm appear in Appendix A).
4.6. Semantic Metadata External & Internal Enrichments Synchronization Engine
This sub-section presents the semantic metadata external & internal enrichment synchronization engine which represents which processes to synchronize and which enrichments to push outside the ecosystem.
4.7. User Interest-Based Gateway
This sub-section presents the user interest-based gateway (UIG) that represents the person (mobile or stationary) who interacts with the ecosystem.
The users and contributors are categorized into five groups:
1. Interest-based gateway (mobile-first),
2. Semantic Search Engine (SSE),
5. Metadata source selection.
4.8. Semantic Master Catalogue
This sub-section presents the semantic master catalogue (SMC) that represents the knowledge base of the SMESE ecosystem.
5. An Implementation of SMESE for a Large Semantic Digital Library in Industry
The proposed SMESE architecture has been implemented for a large digital library. The product In Média V5 was implemented with a global metadata model defined with all the known entities and constraints. The catalogue contains more than 2 million items, with 18 entities and 132 defined metadata. SMMC identifies 1453 metadata and defines a metamodel that consists of a semantic classification of metadata into meta entities.
In addition to semantic web technologies, the characteristics and challenges of SMESE for large digital libraries are:
1. Automatic cataloguing with the least human intervention.
2. Metadata enrichment.
3. Discovery and definition of semantic relationships between metadata and records.
4. Semi-automatic classification of bibliographic records.
5. Semantic cataloging and validated metadata making use of a multilingual thesaurus.
First, we defined a list of entities, called Meta Entity, which introduced 193 items. These items represent all library materials. In addition, the structure of the model allows addition of new entities as may be required. Figure 7 shows the SMESE meta-entity model where for each entity there is: an ID, property Name, description, labels in different languages, and the domain that represents the logic group of the entity; for reason of formatting, Appendix C shows a readable version. The domain may be “user” as response value for a metadata. In this implementation, all instances of the entities of the domain can be the response value. The ID allows the user to uniquely identify the entity whatever the language, the source of entities or the metadata model (DC, UNIMARC, MARC21, RDA, BIBFRAME).
Next, the list of metadata is defined. 1341 metadata are defined. Each metadata entry has the following additional metadata called Meta Metadata: ID, related Content Type, is Enrichment, is Repeatable, thesaurus, type, and source Of Schema, which are defined as follows:
1. “source Of Schema” represents the origin.
2. “id” allows unique identification of the entity.
3. “property Name” is a comprehensive term that defines this metadata.
4. “UNIMARC”, “MARC21”, “property Name” allow users to create a mapping between them to make them interoperable.
5. “UNIMARC” and “MARC21” are codes such as 300$abcf.
6. “Expected type” represents the type of value that may be assigned to the metadata as response.
7. “isRelated” denotes that the response of the metadata is an entity where the identity is given by “related Content Type”.
8. “thesaurus” mentions the thesaurus name that is used to control the metadata integrity.
9. “type” allows classification of the metadata as “descriptive”, “structural”, “administrative”, “dimension”, “longevity” or “identification”.
This classification allows users to do meta research. Figure 8 shows an illustration of the Meta Metadata model; Appendix D shows a readable version.
The semantic matrix model is defined for each entity based on the metaentity and metadata model. This semantic matrix model allows users to define a metadata matrix for each entity where a metadata matrix denotes the logical subset of metadata of metadata model that describes a given entity. Figure 9 illustrates an example of a semantic metadata matrix for a specific content; Appendix E presents a readable version. The objective behind the matrix is to allow the reuse
Figure 7. SMESE Meta Entity model.
Figure 8. SMESE metadata model.
of metadata for distinct entities. This extends the search range for entities, facilitates the search for users in terms of search criteria and increases the probability of achieving satisfying results.
After the definition of entities of collections and harvesting of metadata from the dispersed collections, a metadata crosswalk is carried out. This is a process in which relationships among the schema are specified, and a unified schema is developed for the selected collection. It is one of the important tasks for building “semantic interoperability” among collections and making the new digital library meaningful.
The most frequent issues regarding mapping and crosswalks are: incorrect mappings, misuse of metadata elements, confusion in descriptive metadata and administrative metadata, and lost information. Indeed, due to the varying degrees of depth and complexity, the crosswalks among metadata schemas may not-necessarily be equally interchangeable. To solve the issue of varying degrees
Figure 9. Example of a SMESE semantic matrix model.
of depth, we developed atomic metadata: these metadata allow description of the most elementary aspects of an entity. It then becomes easy to map all metadata from any schema.
Figure 10 illustrates a mapping ontology model where relationships are in red while simple descriptions are in black.
Figure 11 shows that each entity has at a minimum one source of schema denoted by the relationship “has Source” and a minimum of one metadata denoted by the relationship “has Metadata”. The relationship “same As” is used to denote the mapping between distinct metadata or entity schema source.
The output of the ontology is an OWL file. This OWL file is used by a crosswalk to automatically assign metadata values that are harvested from distinct sources. In the proposed ecosystem two sources are harvested: Discogs (www.discogs.com) for music and Research Gate (www.researchgate.net) for academic papers.
Figure 10. Ontology mapping model.
Figure 11. Ontology mapping implementation using Protégé.
A total of 94,015,090 metadata records were collected from these two sources:
1. From Discogs, we collected 7,983,288 entities: 2,621,435 music releases, 4,466,660 artists and 895,193 labels.
2. From researchGate, we collected 86,031,802 entities: 77,031,802 publications and more than 9,000,000 researchers.
3. In fact, SMESE contains more than 3.4 billions triplets and growing.
6. Summary and Future Work
In this paper, we proposed a design and implementation of a semantic enriched metadata software ecosystem (SMESE).
The SMESE prototype, which was implemented at BiblioMondo, integrates data and metadata enrichment to support specific applications for distributed content management. To perform this integration, SMESE makes use of the software product line engineering (SPLE) approach, a component-based software development (CBSD) approach and our proposed new concept, called semantic metadata enrichment (SME) with distributed contents and mobile first design (MFD). In this implementation, the SPLE architecture is a combination of FORM and COPA approaches.
We also presented our implementation of SMESE for digital libraries. This included SPLE-DSP, a new decision support process for SPLE. SPLE-DSP consists of a dynamic and optimized metadata based reconfiguration model (DOMRM) where users select their preferences in the market place. SPLE-DSP takes into account runtime variability functionalities, multiple and dynamic binding, context-awareness and self-adaptation.
We also implemented the Meta Entity that represents all library materials and meta metadata. The ontology mapping model was then implemented to make our models interoperable with existing metadata models such as Dublin Core, UNIMARC, MARC21, RDF/RDA and BIBFRAME.
The major contributions of this paper are as follows:
1. Definition of a software ecosystem architecture (SMESE) that configures the application production process including software aspects based on CBSD and SPLE approaches.
a) The use of a LOD-based semantic enrichment model for semantic annotation processes.
b) The integration of National Research Council of Canada (NRC) emotion lexicon for emotion detection.
c) A repository of 43 thesaurus included in RAMEAU for semantical contextualization of concepts.
a. An extended latent Dirichlet allocation (LDA) algorithm for topic modeling.
2. Definition and partial implementation of semantic metadata enrichment using metadata SPLE and an SMMC (semantic master metadata catalogue) to create a universal metadata knowledge gateway (UMKG).
3. The design and implementation of an SMESE prototype of for a semantic digital library (Libër).
This paper proposed a semantic metadata enrichments software ecosystem (SMESE) to support multi-platform metadata driven applications, such as a semantic digital library. Our SMESE integrates data and metadata based on mapping ontologies in order to enrich them and create a semantic master metadata catalogue (SMMC).
Within the SPLE context, SPLE-DSP is used by SMESE to support dynamic reconfiguration. This consists of a dynamic and optimized metadata based reconfiguration model (DOMRM) where users select their preferences within the market place. SPLE-DSP takes into account runtime metadata-based variability functionalities, multiple and dynamic binding, context-awareness and self- adaptation. Our SMESE represents more than 200 million relationships (triplets).
Future work will include:
1. An enhanced ecosystem of connecting engines and rule-based algorithms to enrich metadata semantically, including topics and sentiments/emotions.
2. Evaluation of the performance of an implementation of the SMESE ecosystem using different projects, comparing results against existing techniques of metadata enrichments.
Exploring text summarization and automatic literature review as metadata enrichment, the semantic annotations could be used to enrich metadata and provide new types of visualizations by chaining documents backward and forward inside automated literature reviews.
Appendix A: Dynamic and Optimized Metadata-Based Reconfiguration Model (DOMRM)
This Appendix presents the details of the DOMRM model. The main idea behind DOMRM is the more a user uses a specific feature, the more his weight for this feature increases. The weight UjFi of user j for feature i is given by:
where n(Uj, Fi) denotes the number of times user j used the feature i.
Making use of user weight per feature and their preferences, the feature weight that determines its activation or not is computed. Considering that US is the set of users who have selected a feature Fi (activation of feature), and UR is the set of users who have removed that feature (deactivation of feature), the value 1 is assigned when a user actives the feature, and −1 when he removes it. Let c(Uj, Fi) be the choice of user j for the activation status of feature Fi. The weight of feature Fi can be defined using the following formula:
The computed weight of each feature allows one to define the weight FM that is used by the system optimal configurator with the FCM to generate the new configuration of the system for all users. When the feature weight is negative and the FIS rules allow de-activation, the feature is deactivated and when the feature weight is positive and the FIS rules allow activation the DOMRM model activates the feature. The activation status of the feature is not modified when the feature weight is null and the current activation status is conserved.
Appendix C: Figure 7. SMESE Meta Entity Model
Appendix D: Figure 8. SMESE Metadata Model
Appendix E: Figure 9. Example of a SMESE Semantic Matrix Model