Presentation is loading. Please wait.

Presentation is loading. Please wait.

13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1.

Similar presentations


Presentation on theme: "13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1."— Presentation transcript:

1 13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

2 13 October 2010 Present Scenario 2 Data Users need data in low com situations Transported Very slow Bottle Neck Search Criteria User must know what to search on

3 13 October 2010 UGFIDD Overview Provide a simple to use Web Service interface – This allows for customized clients – Free text “Google” like searches – Complete un-structed data – No need for a data model to follow – Communication is done over HTTP through SOAP (Simple Object Access Protocol ) messages – Currently supports PDFs, Microsoft Docs, JPEGs 3 Provide usable return types RSS Feeds – Allow users to subscribe to standing queries KML Results – Allow users to visually represent their data spatially Plain Text – Give users their information fast and reliably Bittorent – Allow users to distribute data quickly and distributed

4 13 October 2010 Code Enviroment Subversion – I have written a lot of code and have spent a lot of time, provides a piece of mind – All of the code written was done under version control. This is very important in today’s commercial atmosphere. – Allows for many developers to work at the same type – Assists with merge conficts – Allows easy reverts and diff’s to be done Maven – New and upcoming build tool – Allows for easy integration and dependency management – Completely written in XML – Repositories allow for open source projects to be easily be pulled in to assist in program development 4

5 13 October 2010 Pom File (Snippet) <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 4.0.0 com.p2p Peer2peer war 1.0-SNAPSHOT Peer2peer http://maven.apache.org UTF-8 1.0 ${artifactId} 1.6 junit 3.8.1 test jpath jpathwatch 0.93 5

6 13 October 2010 High Level Architecture 6 Extractors + Publishers Doc / Jpeg Parser Bittorent Publisher Core Services Indexing SoapUI Solr Interfaces Jetty Daemon Startup & Shutdown File Monitor Rss Feed / Kml Feed Query EndPoints WebServiceE ndpoints Utilities

7 13 October 2010 Ingest Orchestration Ingest of a file Uses Tikka document extractors to extract header information along with binary data. JPEG parser parses Geospatial data File XML Metadata XML Metadata Ingest monitor is triggered off of system level events. Solr File System File System Metadata Extraction File Monitor HTTP Schema has been customized to store location and other valuable data

8 13 October 2010 Query Orchestration Publish of results 8 Depending on the return type and call UGFIDD will query and return customized results Files XML Metadata XML Metadata User enters the query “Syracuse” Solr Web Service Endpoint Core Services HTTP Use query to search index Parse Query HTTP Publish Torrent RSS Google Earth

9 13 October 2010 GeoHash (Example GeoHash.java) GeoHash algorithm recently developed by Gustavo Niemeyer Publicly released in 2008 Very new way of representing geo-spatial data UGFIDD takes advantage of the single hash produced by the algorithm Found many implementations in other languages (Python), ported it over to Java for the UGFIDD project Distance searches Geohash produces bounding boxes by nature This is a perfect fit for UGFIDD and it’s free search capability Geospatial searches are now extremely fast and easy to implement Do not need complicated point radius algorithms which slow processing down WKT (Well Known Text) A new spec to represent vector geometry on single lines User can query using single strings and does not need to represent points as Lat, Lon

10 13 October 2010 Place Holder for Geohash performance 10

11 13 October 2010 WKT POINT(6 10) LINESTRING(3 4,10 50,20 25) POLYGON((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)) MULTIPOINT((3.5 5.6), (4.8 10.5)) MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4)) MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3))) http://en.wikipedia.org/wiki/Well-known_text 11

12 13 October 2010 Bittorrent Allows for distributed downloading Users download.torrent files which represent the tracker and information about the file or files Many free available clients available to use Bittorrent takes pressure off of the central server – Users only download the.torrent file – Communicate via the tracker (UGFIDD is using a open source tracker) – Users download from each other while there is a seed – UGFIDD will always be the initial seeder Extremely fast downloads – Users download from each other and do not tie up the bandwidth pipe going to the server – Utilizes file pieces described in the.torrent file (pieces are downloaded from each other 12

13 13 October 2010 Torrent 13 Torrent file has been created and seeded. Others can now download the torrent file and connect to the swarm File will then be downloaded from the server as well as clients

14 13 October 2010 RSS Feed Users want their information when they aren’t there RSS Feed allows the user to set up specific query and walk away – Query will be “standing” for a configurable amount of time – Feed will be updated as the query is hit – Fast and easy to learn publish and subscribe system – Most users know how to use RSS (easy to use) RSS page is unique to that user and query – User can however pass the URL to other users who then can subscribe to the query too – Example : A group of users is interested in “IED and Iraq”. A RSS query is set up, as products are placed into the monitor directory, that information is passed onto the user’s RSS feed 14

15 13 October 2010 Google Earth KML (Keyhole Markup Language) – XML data that Google Earth knows how to display Visually represent data – More and more users are using tools to see their data visually – Can see similarities (such as distance and location) – Quickly find relevant data 15

16 13 October 2010 JPEG Product displayed via published KML 16

17 13 October 2010 GeoCoder UGFIDD utilizes geo-coder web services provided by Google Passing in a String will result in null if nothing is found or a Lat Lon for the location Example: – User searches for “Syracuse” – UGFIDD will return hits for documents that contain “Syracuse” and also geospatial results near Syracuse, NY – http://code.google.com/apis/maps/documentation/g eocoding/ http://code.google.com/apis/maps/documentation/g eocoding/ 17

18 13 October 2010 Future Work Make it faster! – Multiple SOLR implementations. Distributed data implementation – Java Executor Service allows for multi-thread workers. This has been implemented but will take time to adjust based off of system Create a client – Currently UGFIDD is a server only implementation – Creating a client is easy with web services – Allow user to ingest files using HTTP and FTP upload Distributed Queries – Currently only one server is queried at a time – Would like to make a middle “tracker” to distribute queries and results

19 13 October 2010 Demo Server is running on my home computer with an ingest directory already set up Will move files into ingest directory Demonstrate query capability Demonstrate publishing capability Will use SOAP UI a web services test utility to demonstrate client interaction http://www.soapui.org/ Code is located at: http://code.google.com/p/peer2peersuny/source/browse/ http://code.google.com/p/peer2peersuny/source/browse/ 19

20 13 October 2010 UGFIDD Questions? 20


Download ppt "13 October 2010 UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1."

Similar presentations


Ads by Google