摘要翻译:
Ontology is an effective conceptualism commonly used for the Semantic Web. Fuzzy logic can be incorporated to ontology to represent uncertainty information. Typically, fuzzy ontology is generated from a predefined concept hierarchy. However, to construct a concept hierarchy for a certain domain can be a difficult and tedious task. To tackle this problem, this paper proposes the FOGA (Fuzzy Ontology Generation framework) for automatic generation of fuzzy ontology on uncertainty information. The FOGA framework comprises the following components: Fuzzy Formal Concept Analysis, Concept Hierarchy Generation, and Fuzzy Ontology Generation. We also discuss approximating reasoning for incremental enrichment of the ontology with new upcoming data. Finally, a fuzzy-based technique for integrating other attributes of database to the ontology is proposed
本体是一个有效地通用概念用于语义网。模糊逻辑可以用于本体以表示不确定的信息。
一般模糊本体可以从预定义的概念层上识别。但是构造特定领域的概念层是一个十分困难和单调的任务。为了解决这个问题,本文提出了一个FOGA(模糊本体产生框架),用于模糊本体的不确定信息自动识别。FOGA框架包含以下概念:
模糊形式概念分析
概念层识别
模糊本体识别
我们也讨论一下用新出现数据对本体进行改进的原因。
最后,提出了一个基于模糊对本体进行整合的技术。
1.概述
ONTOLOGY is a conceptualization of a domain into a human understandable, machine-readable format consisting of entities, attributes, relationships, and axioms [1]. It is used as a standard knowledge representation for the Semantic Web [2]. However, the conceptual formalism supported by typical ontology may not be sufficient to represent uncertainty information commonly found in many application domains due to the lack of clear-cut boundaries between concepts of the domains. For example, a document can be very relevant, relevant,or irrelevant to a research area. In addition, keywords extracted from scientific publications can be used to infer the corresponding research areas. However, it is inappropriate to treat all
keywords equally as some keywords may be more significant than others.
本体是人类对一个领域的抽象化,用实体、属性、关系和定理表示。它可以用于语义网的知识表示。但是由于领域之间没有明确的划分,标准的本体形式化标准并不能充分表示应用系统领域中的不确定信息。此外,从科学出版物扩展出的关键字可以用于推断相关查询区域。但是,这不能适用于所有的关键字,因为一些关键字可能具有其它含义。
To tackle this type of problems, one possible solution is to incorporate fuzzy logic [3] into ontology to handle uncertainty data. Traditionally, fuzzy ontology is generated and used in text retrieval [4] and search engines [5], in which membership values are used to evaluate the similarities between the concepts in a concept hierarchy. However, manual generation of fuzzy ontology from a predefined concept hierarchy is a difficult and tedious task that often requires expert interpretation. So, automatic generation of concept hierarchy and fuzzy ontology from uncertainty data of a domain is highly desirable.
为了解决这类问题,一个可能的解决方案是引入模糊逻辑处理不确定数据。传统的说,在文本检索和搜索引擎上,模糊本体被识别和使用,其成员关系系数被用于评估不同概念之间的相似度。但是预定义概念层的模糊本体的手工识别是一个困难和单调的任务。因此,对不确定数据概念层和模糊本体的自动识别是一个十分值得研究的事。
In this paper, we propose a framework known as FOGA (Fuzzy Ontology Generation frAmework) that can automatically generate a fuzzy ontology from uncertainty data
based on Formal Concept Analysis (FCA) [6] theory. The generated fuzzy ontology is mapped to a semantic representation in OWL (Web Ontology Language) [7]. The rest of this paper is organized as follows: Section 2 discusses related work on ontology generation and FCA. Section 3 gives some basic definitions and operators of the fuzzy theory. The FOGA framework is presented in Section 4. Section 5 discusses the approximating reasoning technique to incrementally furnish the generated ontology with new instance. The problem of integrating extra attributes in database to the ontology is given in Section 6. Performance evaluation of the proposed FOGA framework is given in Section 7. Finally, Section 8 concludes the paper.
本文提出了一个FOGA框架,能够从基于正式概念分析理论的不确定数据中自动识别模糊本体。识别出的模糊本体将用OWL表示语义。论文结构如下:
第2章讨论本体识别和FCA的相关工作
第3章给出了模糊理论的一些基础定义和操作
第4章描述了FOGA框架
第5章讨论了识别本体的一些相似技术
第6章列出了将数据库中额外的属性集成到本体中遇到的问题
第7章对FOGA框架的性能进行了评估
最后进行了总结
2RELATED WORK
2.1 Ontology Generation
Although editing tools [8], [9] have been developed to help users to create and edit ontology, it is a troublesome task to manually derive ontology from data. Typically, ontology can be generated from various data types such as textual data [10], [11], [12], [13], [14], [15], [16], [17], dictionary [18], [19], knowledge-based [20] semistructured schemata [21],[22], [23], and relational schemata [24], [25], [26], [27].Compared to other types of data, ontology generation from textual data has attracted the most attention. Among techniques used for processing textual data, clustering is one of the most effective techniques for ontology learning.Conceptual clustering techniques such as COBWEB [28]and CLASSIT [29] are powerful clustering techniques that can conceptualize clusters for ontology generation [16], [17].
2.1本体识别
尽管编辑工具可以帮助用户创建和编辑本体,将数据转换成本体依然是一个麻烦的事。典型的研究包括:本体可以从不同的数据类型进行识别,比如广西数据、字典、基于知识的半结构化schemata和相关的schemata。与其它数据类型相比,从文本数据中识别本体研究得最多。在这些处理文本数据的技术中,聚类是最有效的本体学习技术之一。概念层的聚类技术比如COBWEB和CLASSIT是有效的聚类技术。
2.2 Formal Concept Analysis
FCA is a formal technique for data analysis and knowledge presentation. It defines formal contexts to represent relationships between objects and attributes in a domain.From the formal contexts, FCA can then generate formal concepts and interpret the corresponding concept lattice, so that information can be browsed or retrieved effectively
FCA是一个数据分析和知识表示的有效技术。它定义了形式上下文去表示一个域中对象和属性之间的关系。从形式上下文中,FCA可以识别形式概念和集成相关
概念格,因此信息可以被浏览或有效恢复。
FCA is widely used for various applications, such as, text processing [30], [31], [32], ontology merging [33], [34],e-mail manager [35], [36], e-learning [37], Web navigation [38], and expert system [39]. However, as most concept lattices are quite complicated in terms of the number of concepts generated, it is necessary to simplify the lattice generated. In the Iceberg concept lattice [40], association
rules are typically used for clustering concepts on the lattice. Conceptual scaling [41] or lattice theory [42] is then used to generate the concept hierarchy in the TOSCANA [43] and GALOIS systems [44], respectively. In order to prune the lattices generated for text mining, clustering is first performed on the data set to generate clusters of documents [45], [46]. Then, feature selection is used to extract frequent keywords (or terms) from documents in each cluster as attributes for the cluster
FCA被用于不同的领域,比如文本处理、本体合并、email管理、e-learning、Web导航和专家系统。但是,从概念识别的数量看,绝大多数概念格十分复杂。在iceberg概念格中,相关规则被用于对格子上的概念进行聚类。在TOSCANA和GALOTS系统中,概念层缩放比例或格的理论被用于识别概念层,特别是为了文本挖掘而缩小格的识别,聚类技术是第一个采用数据集去识别文档。因此,特征选择法从每个文档簇中提取大量的关键字或单词作为其属性。
Traditional FCA-based conceptual clustering approaches are hardly able to represent such vague information. To tackle this problem, fuzzy logic can be incorporated into FCA to handle uncertainty information for conceptual clustering and concept hierarchy generation. Pollandt [47],Burusco and Fuentes-Gonza ´ lez [48], and Huynh and Nakamori [49] have proposed the L-Fuzzy context as an attempt to combine fuzzy logic with FCA. The L-Fuzzy context uses linguistic variables, which are linguistic terms associated with fuzzy sets, to represent uncertainty in the context. However, human interpretation is required to define the linguistic variables. Moreover, the fuzzy concept lattice generated from the L-fuzzy context usually causes a combinatorial explosion of concepts as compared to the traditional concept lattice.
传统的基于FCA的概念聚类方法很难表示模糊信息。为了解决概念聚类和概念层识别,模糊逻辑结合FCA可以用以处理不确定信息。pollandt,burusco和fuentes-gonza等提出了将模糊逻辑和FCA结合的尝试。L-Fuzzy上下文使用语言学变数和模糊集相关联,去表示上下文的不确定性。但是,手工翻译需要定义语言学变数。而且,L-Fuzzy上下文的模糊概念格相对于传统概念格,经常引起概念的组合爆炸。
We propose a new technique that combines fuzzy logic and FCA as Fuzzy Formal Concept Analysis (FFCA), in which the uncertainty information is directly represented by a real number of membership value in the range of [0,1]. As such, linguistic variables are no longer needed. Compared to the fuzzy concept lattice generated from the L-fuzzy context, the fuzzy concept lattice generated using FFCA will be simpler in terms of the number of formal concepts. It also supports a formal mechanism for calculating concept similarities.
我们提出了一个新的技术,结合模糊逻辑和FCA作为模糊形式概念分析(FFCA),不确定信息直接由大量的0-1范围的关系值表示。比如语言学变数不再需要。与从L-Fuzzy上下文中识别概念格相比,由形式概念的数量得知,使用了FFCA识别模糊概念格将更简单。这也支持形式化原理去计算概念相似度。
3FUZZY THEORY
In this section, we review some fundamental knowledge of fuzzy theory [3].
定义1(模糊集):一个域U上的模糊集A,是一个从U到[0,1]的函数u,比如,A中的每个项是一个u确定的关系值。我们记为y(S),作为一个模糊集,由传统项的集合S产生。每个项在S中存在一个关系值[0,1]。S可以叫做脆集。
定义2(模糊关系):一个在域G*M上的模糊集A是G,M上的一个模糊关系,其G和M是二个脆集
定义3(模糊集交集):模糊集A和B的交集记为A交B。
The max-min composition indicates the strength of lation between the element of X and Z.
max-min组合是指在元素X和Z之间的平移长度。
4 THE FOGA FRAMEWORK
Fig. 1 shows the proposed FOGA (Fuzzy Ontology Generation frAmework), which consists of the following components.
图1显示了FOGA框架,其中包括以下组件。
4.1 Fuzzy Formal Concept Analysis
The Fuzzy Formal Concept Analysis incorporates fuzzy logic into Formal Concept Analysis to represent vague information.
模糊形式概念分析是指用模糊逻辑去表示不确定信息。
Fuzzy formal context can also be represented as a cross-table as shown in Table 1. The context has three objects representing three documents, D1, D2, and D3. It also has three attributes, “Data Mining,” “Clustering,” and “Fuzzy Logic” representing three research topics. The relationship between an object and an attribute is represented by a membership value in [0, 1].
模糊形式上下文可以用cross表表示,如表1。表中有三个对象表示三个文档D1,D2,D3.它们具有三个属性:data mining,clustering,fuzzy logic,表示三个研究主题。对象与属性之间的关系用0-1之间的关联值表示。
An a-cut can be set to eliminate relations that have low membership values. Table 2 shows the cross-table of the fuzzy formal context given in Table 1 with a= 0.5.
用a-cut去评估低关联值的关系。表2显示了当a= 0.5时,由表1生成cross-table表
Generally, we can consider the attributes of a formal concept as the description of the concept. Thus, the relationships between the object and the concept should
be the intersection of the relationships between the objects and the attributes of the concept. Since each relationship between the object and an attribute is represented as a membership value in fuzzy formal context, the intersection
of these membership values should be the minimum of these membership values as in Definition 3
通常,我们将形式概念的属性作为概念的描述。这样,对象与概念之间的联系应该是对象与概念属性之间联系的交集。因为对象与概念之间的联系用模糊形式上下文关联值表示,这些关联值的交集应该是定义3中这些关联值的这小值。
Note that the fuzzy formal concept in Definition 1 can be considered a special case of a many-valued context [13]. However, our fuzzy-based modification of FCA as presented in Definitions 9 and 11 preserves differently continuous values of objects’ memberships, which are crucial for calculating concepts’ similarities. In a formal context, a concept can have many superconcepts and subconcepts. However, the similarities of a concept to its superconcepts and subconcepts are different. Such information cannot be shown in a traditional concept lattice. With fuzzy concept lattice, we can make use of the fuzzy set theory to calculate the similarities between a concept and its subconcepts.
定义1中的模糊形式概念可以认为是一个特殊的例子。但是,我们的在定义9和定义11中,基于模糊的FCA修正则保留了对象之间关联的持续性数值,这对计算概念相似度十分有利。在形式上下文中,一个概念可能有一些超概念和子概念。但是,一个概念相对于父概念和子概念的相似度是不同的。这样的信息不可能在传统概念格中被显示。用模糊概念格,我们可以使用模糊集去计算一个概念和其子概念的相似度。
Fig. 2 shows the traditional concept lattice generated from Table 1 without membership values. Fig. 3 shows the fuzzy concept lattice generated from the fuzzy formal context given in Table 2, in which the similarities between the concepts are given. Fig. 4 shows the L-fuzzy lattice generated by replacing the membership values in Table 2 with the corresponding linguistic values “low,” “medium,” and “high” given.Clearly, the fuzzy concept lattice is simpler than the L-fuzzy lattice in terms of the number of formal concepts. It can provide additional information, such as membership values of objects in each fuzzy formal concept and similarities of fuzzy formal concepts, which are important for the construction of concept hierarchy
图2显示了不用关联值,利用表1的数据生成传统的概念格。图3显示了从利用模糊形式上下文从表2生成 模糊概念格,其概念之间的相似度是给定的。图4显示了用语言学变数“low”,"medium"和"high"代替表2中的关联值生成的L-fuzzy格。它也能提供另外的信息,比如每个模糊形式概念中对象之间的关联值和模糊形式概念的相似度,这对概念层次的构造十分有用。
4.2 Concept Hierarchy Generation
Concept Hierarchy Generation clusters the fuzzy concept lattice generated by FFCA to construct a concept hierarchy in the two following steps.
4.2.1 Fuzzy Conceptual Clustering
As in traditional concept lattice, the fuzzy concept lattice generated using FFCA is sometimes quite complicated due to the large number of fuzzy formal concepts generated. Since the formal concepts are generated mathematically, objects that have small differences in terms of attribute values are classified into distinct formal concepts. Such objects should belong to the same concept when they are interpreted by human.
概念层次生成通过FFCA方法进行聚类,主要有以下两步:
4.2.1 模糊概念聚类
相对于传统概念格,模糊概念格用FFCA法生成十分复杂,因为模糊形式概念生成太大。由于形式概念已精确生成,存在微小异同的对象根据其属性值常常被归类到不同的形式概念中。当他们由人工整合时,这样的对象拥有相同的概念。
Thus, we cluster formal concepts into conceptual clusters using fuzzy conceptual clustering. Compared to traditional clusters, the conceptual clusters generated have the following properties:
. Each conceptual cluster is considered as a human interpretable concept in the domain of the fuzzy concept lattice.
. Each conceptual cluster is a sublattice extracted from the fuzzy concept lattice.
. A formal concept must belong to at least one conceptual cluster. For example, a scientific document can belong to more than one research area.
因此,我们使用模糊概念聚类,将形式概念聚类成概念层的聚类。和传统聚类相比,概念层的聚类有下列特征:
. 在模糊概念格的领域里,每个概念层聚类被认为是可人工说明的概念
. 每个概念层的聚类是一个从模糊概念格扩展出的子格
. 一个形式概念至少拥有一个概念聚类。比如一个科学文档可以拥有多个查询条件。
Conceptual clusters are generated based on the premise that if a formal concept A belongs to a conceptual cluster R,then its subconcept B also belongs to R if B is similar to A.We can use a similarity confidence threshold Ts to determine whether two concepts are similar or not.
概念聚类产生的前提是一个形式概念A拥有一个概念聚类R,则如果B与A相似,它的子概念B也拥有R。我们能使用相似度信任阀值Ts去决定两个概念是否相似。
图5a和图5b显示了从模糊概念格图3生成的概念聚类图,其中Ts=0.4和Ts=0.5,特别是图6给出了从Cs生成概念聚类的算法,从模糊概念格F(K)的起始概念到生成所有F(K)的概念聚类,我们选择Cs=sup(F(K))
A conceptual cluster can be considered as a set of fuzzy formal concept. Each concept is associated with a set of objects and attributes. As such, each conceptual cluster can also be represented as sets of objects and attributes.Moreover, each object in each conceptual cluster should have a membership value implying the uncertainty degree of the fact “the object belongs to the conceptual cluster.” Therefore, we define some formal definitions as follows:
概念聚类可以被认为是模糊形式概念的集合。每个概念与对象和属性集合存在联系,每个概念聚类可以用对象和属性集合表示。而且,在概念聚类中的每个对象拥有一个关联值指示对象属于概念聚类的不确定程度。因此,我们定义一些形式定义: