ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
  • 1
    Publikationsdatum: 2012-10-23
    Beschreibung: Data can be represented in many different ways within a particular document or set of documents. Hence, attempts to automatically process the relationships between documents or determine the relevance of certain document objects can be problematic. In this study, we have developed software to automatically catalog objects contained in HTML files for patents granted by the United States Patent and Trademark Office (USPTO). Once these objects are recognized, the software creates metadata that assigns a data type to each document object. Such metadata can be easily processed and analyzed for subsequent text mining tasks. Specifically, document similarity and clustering techniques were applied to a subset of the USPTO document collection. Although our preliminary results demonstrate that tables and numerical data do not provide quantifiable value to a document’s content, the stage for future work in measuring the importance of document objects within a large corpus has been set.
    Digitale ISSN: 1999-4893
    Thema: Informatik
    Publiziert von MDPI Publishing
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...