• <code id="tvhzc"><ol id="tvhzc"></ol></code>
    <big id="tvhzc"><span id="tvhzc"></span></big>
  • <object id="tvhzc"><strong id="tvhzc"></strong></object>
    首頁> 外文學位 >Leveraging external user-generated information for large-scale data integration.
    【24h】

    Leveraging external user-generated information for large-scale data integration.

    機譯:利用外部用戶生成的信息進行大規模數據集成。

    獲取原文
    獲取原文并翻譯 | 示例

    摘要

    The proliferation of data sources both in the private and public domains (e.g., in enterprise environments and on the World-Wide Web) underscores the need for data integration systems. The purpose of a data integration system is to enable users to access data residing in multiple heterogenous sources through a uniform interface. Manual solutions for building such systems are not a viable option, especially when dealing with large-scale and complex applications.;This dissertation studies the automation of building data integration systems. In particular, it addresses three key challenges that lie at the heart of any such system.;The first challenge relates to the construction of wrappers for the unstructured sources. A source wrapper would ensure that the data in the underlying source is perceived as structured data by the other parts of the system. We particularly focus on sources containing data formatted as lists, and propose a new solution for extracting relational tables from them. The proposed solution is completely unsupervised and domain-independent. It is based on leveraging various sources of information, including a corpus of tens of millions of relational tables published by users on the Web.;The second and third challenges are concerned with establishing semantic mappings across data sources. We first propose a new solution for discovering the correspondences across the elements of two schemas. Then, based on these simple correspondences, we propose another solution to discover more complex declarative mapping rules that can actually be used to transform data and queries across the two schemas. The key underpinning for these two solutions is that, unlike previous approaches, they both exploit the usage information extracted from database query logs. This work is the first to introduce the usage-based approach for establishing mappings across data sources.;To evaluate our approaches, we conducted experiments using realistic data sets, such as real web lists for the wrapper construction work; and schemas and query logs from the retail and life sciences domains for the work on semantic mappings. The experimental results have verified the effectiveness and applicability of our proposed approaches.
    機譯:私有域和公共域中(例如,在企業環境中和在萬維網上)數據源的激增強調了對數據集成系統的需求。數據集成系統的目的是使用戶能夠通過統一接口訪問駐留在多個異構源中的數據。建立這樣的系統的手動解決方案不是一個可行的選擇,尤其是在處理大規模和復雜的應用程序時。;本論文研究了建立數據集成系統的自動化。尤其是,它解決了任何此類系統核心的三個關鍵挑戰。第一個挑戰涉及為非結構化源構建包裝器。源包裝器將確?;A源中的數據被系統的其他部分視為結構化數據。我們特別關注包含格式化為列表的數據的源,并提出一種從中提取關系表的新解決方案。所提出的解決方案是完全不受監督且與域無關的。它基于利用各種信息源的信息,包括用戶在Web上發布的數千萬個關系表的語料庫。第二個和第三個挑戰涉及跨數據源建立語義映射。我們首先提出一種新的解決方案,用于發現兩個模式的元素之間的對應關系。然后,基于這些簡單的對應關系,我們提出了另一種解決方案,以發現更復雜的聲明性映射規則,這些規則實際上可用于在兩種模式之間轉換數據和查詢。這兩種解決方案的關鍵基礎是,與以前的方法不同,它們都利用從數據庫查詢日志中提取的使用信息。這項工作是第一個引入基于用法的方法來建立跨數據源的映射。為了評估我們的方法,我們使用了真實的數據集進行了實驗,例如包裝器構造工作的真實Web列表;以及零售和生命科學領域的模式和查詢日志,以進行語義映射。實驗結果證明了我們提出的方法的有效性和適用性。

    著錄項

    • 作者

      Elmeleegy, Hazem.;

    • 作者單位

      Purdue University.;

    • 授予單位 Purdue University.;
    • 學科 Computer Science.
    • 學位 Ph.D.
    • 年度 2010
    • 頁碼 154 p.
    • 總頁數 154
    • 原文格式 PDF
    • 正文語種 eng
    • 中圖分類
    • 關鍵詞

    相似文獻

    • 外文文獻
    • 中文文獻
    • 專利
    獲取原文

    客服郵箱:kefu@zhangqiaokeyan.com

    京公網安備:11010802029741號 ICP備案號:京ICP備15016152號-6 六維聯合信息科技 (北京) 有限公司?版權所有
    • 客服微信

    • 服務號

    在线中文字幕日本无码欧美_国产萝控精品福利视频_2020黄色三级片电影_人妻系列在线亚洲
  • <code id="tvhzc"><ol id="tvhzc"></ol></code>
    <big id="tvhzc"><span id="tvhzc"></span></big>
  • <object id="tvhzc"><strong id="tvhzc"></strong></object>