• <code id="tvhzc"><ol id="tvhzc"></ol></code>
    <big id="tvhzc"><span id="tvhzc"></span></big>
  • <object id="tvhzc"><strong id="tvhzc"></strong></object>
    首頁> 外文學位 >Leveraging external user-generated information for large-scale data integration.

    Leveraging external user-generated information for large-scale data integration.


    獲取原文并翻譯 | 示例


    The proliferation of data sources both in the private and public domains (e.g., in enterprise environments and on the World-Wide Web) underscores the need for data integration systems. The purpose of a data integration system is to enable users to access data residing in multiple heterogenous sources through a uniform interface. Manual solutions for building such systems are not a viable option, especially when dealing with large-scale and complex applications.;This dissertation studies the automation of building data integration systems. In particular, it addresses three key challenges that lie at the heart of any such system.;The first challenge relates to the construction of wrappers for the unstructured sources. A source wrapper would ensure that the data in the underlying source is perceived as structured data by the other parts of the system. We particularly focus on sources containing data formatted as lists, and propose a new solution for extracting relational tables from them. The proposed solution is completely unsupervised and domain-independent. It is based on leveraging various sources of information, including a corpus of tens of millions of relational tables published by users on the Web.;The second and third challenges are concerned with establishing semantic mappings across data sources. We first propose a new solution for discovering the correspondences across the elements of two schemas. Then, based on these simple correspondences, we propose another solution to discover more complex declarative mapping rules that can actually be used to transform data and queries across the two schemas. The key underpinning for these two solutions is that, unlike previous approaches, they both exploit the usage information extracted from database query logs. This work is the first to introduce the usage-based approach for establishing mappings across data sources.;To evaluate our approaches, we conducted experiments using realistic data sets, such as real web lists for the wrapper construction work; and schemas and query logs from the retail and life sciences domains for the work on semantic mappings. The experimental results have verified the effectiveness and applicability of our proposed approaches.


    • 作者

      Elmeleegy, Hazem.;

    • 作者單位

      Purdue University.;

    • 授予單位 Purdue University.;
    • 學科 Computer Science.
    • 學位 Ph.D.
    • 年度 2010
    • 頁碼 154 p.
    • 總頁數 154
    • 原文格式 PDF
    • 正文語種 eng
    • 中圖分類
    • 關鍵詞


    • 外文文獻
    • 中文文獻
    • 專利


    京公網安備:11010802029741號 ICP備案號:京ICP備15016152號-6 六維聯合信息科技 (北京) 有限公司?版權所有
    • 客服微信

    • 服務號

  • <code id="tvhzc"><ol id="tvhzc"></ol></code>
    <big id="tvhzc"><span id="tvhzc"></span></big>
  • <object id="tvhzc"><strong id="tvhzc"></strong></object>