Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regions of Interest.  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage.

Similar presentations


Presentation on theme: "Regions of Interest.  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage."— Presentation transcript:

1 Regions of Interest

2  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage

3  ROI  Geometry  Measurements  ROI on Channel  Annotations ▪ ROI ▪ Measurement ▪ Links

4  User created ROI  Measurement tools  HCS generated ROI  Automatic  External  External analysis  Particle Tracking  Other  Templates  ROIs without images

5  Human generated  More interactions ▪ Merge, Propagate, Split, Delete  Measurements ▪ Geometry ▪ Intensity ▪ Path  ROI/ROI Links  Tags mostly on ROI  Write Many/Read Many

6  HCS Generated ROI  Lots of ROI  Attached to Channel  Measurements Attached ▪ Multiple measurements  Tags on ROI, Measurements ▪ Analysis, results and meta.  Write Once, Read Many

7  External Tool can Generate ROI (+ scripts)  Can be tagged  Links (ROI/ROI, ROI/Image)  Results can be in any format

8  ROI need not be attached to image  Template to define other ROI

9  N-Dimensional Data  Storage of Image data simple  ROI more complex ▪ Database entry, file format  We don’t just want to store in HDF

10  Database  ROI  ROI Annotations  PyTables  Mask ROI  Measurements

11  Pytables  ROI are heterogeneous  Concurrency  Python behind a core service call  Measurements are optimal  Tagging is an issue ▪ Inside file ▪ Multiple annotations reported to be slow

12  ROI can be stored in database  Mask data can be an issue  Tagging in RBD not best  Many more annotations than we’d like  Link to external source for measurements

13  Key-Value Pair Stores  Berkeley DB  Project Voldermort  Tokyo Cabinet  Document DB  MongoDB  CouchDB  Graph DB  Neo4J  InfoGrid  Table DB  Cassandra  Hypertables  HBase

14  Other opinions on the storage solutions  MongoDB vs CouchDB, Cassandra,.. MongoDB vs CouchDB, Cassandra,..  CouchDB vs MongoDB CouchDB vs MongoDB  Pros and cons of MongoDB Pros and cons of MongoDB  Digg on Cassandra Digg on Cassandra  What is a supercolumn What is a supercolumn  Cassandra talk Cassandra talk  Indexing nodes in Neo4J Indexing nodes in Neo4J

15  Document Database  NOSQL movement  Schemaless  No Tables ▪ Collections of like data  No Joins ▪ Document is equivalent of row of data ▪ Distributed file system (GridFS)

16 Pros  It has bindings to numerous languages (C++, C#, Java, Python,...).  Allows storage, indexing, linking of any user data  Annotations are now very easy, efficient  Has mechanisms for schema upgrade  Dynamic Queries  Replication  Sharding.  Map-Reduce framework.  Fast.  GridFS is a distributed file storage mechanism within Mongo.  Easy to install Cons  Schemaless, data integrity will need to be worked on.  Graph structures not inherently supported.

17 DEPLOYMENTS  SourceForge http://sourceforge.net/ http://sourceforge.net/  BusinessInsider http://www.businessinsider.com/ http://www.businessinsider.com/  New York Times http://www.nytimes.com/ http://www.nytimes.com/  Disqus http://www.disqus.com/ http://www.disqus.com/

18 Human Interaction Merge, Propagate, Split ✓ Geometry ✓ Intensity ✓ Path ✓ ROI/ROI Links ✓ Tags ✓ HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements ✓ Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links  Many formats, unknown types ✓ Other N-Dimensional ROI ✓ Hierarchical Structures ✓

19 connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.insert({"tags" : [ ], "label" : “MyROI”, "shapes" : [{ "tags" : [{"tag" : "foo1", "namespace" : "bob"}], "rx" : 17, "ry" : 17, "label" : null, "cy" : 75, "cx" : 3, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 3 }, { "tags" : [{"tag" : "foo2", "namespace" : "bob"}], "rx" : 10, "ry" : 16, "label" : null, "cy" : 82, "cx" : 45, "t" : 0, "z" : 0, "type" : "Ellipse", "id" : 5 }], "type" : "Roi", "id" : 565 })

20 connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({"shapes.tags.tag":'/.*mitosis.*/i'}) connection = Connection(); db = connection['databaseName']; collection = db.['collectionName']; collection.find({”shapes.tags.tag”:”foo1”,”tags.tag”:”foofoo”}) Find roi with tag foofoo and shapes with tag foo1 Find roi shapes with tag containing mitosis

21  Graph Database  use nodes to represent objects  User specifies relationship between nodes  Allows complex traversal of node structures

22 PROS  Handles graph structures nicely  Transactional  Supported by Gremlin Gremlin Gremlin  Native RDF http://components.neo4j.org/neo- rdf-sail/ http://components.neo4j.org/neo- rdf-sail/  Easy to install CONS  No C++ language binding.  Not distributed.  Tables are not so easily modeled.  Difficult to query on node contents

23 DEPLOYMENTS  The Swedish Defence forces http://www.mil.se http://www.mil.se  Windh Technologies http://www.windh.com http://www.windh.com  Flextoll http://www.flextoll.se http://www.flextoll.se

24 public enum OMERORelations implements RelationshipType { ASSOCIATE, DERIVE, AGGREGATE, COMPOSE } Node image = neo.createNode(); image.setProperty("IObject",imageI); image.setProperty("id",imageI.getId().getValue()); image.setProperty("name",imageI.getName().getValue()); Node derivedImage = neo.createNode(); derivedImage.setProperty("IObject",derivedImageI); derivedImage.setProperty("id",derivedImageI.getId().getValue()); derivedImage.setProperty("name",derivedImageI.getName().getValue()); Relationship relationship = image.createRelationshipTo( derivedImage, OMERORelations.DERIVE ); relationship.setProperty("type","ROI"); relationship.setProperty("operation","crop"); relationship.setProperty("roi",cropRoiI);

25 Human Interaction Merge, Propagate, Split ✓ Geometry  Intensity  Path ✓ ROI/ROI Links ✓ Tags  HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements  Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links ✓ Many formats, unknown types  Other N-Dimensional ROI  Hierarchical Structures ✓

26 Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system. Digg have released a language to work with Cassandra called LazyBoy. sawzall LazyBoy Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

27 Pros  Quick  Handles heterogeneous data well  Different rows can have different columns  Can manage distributed data  Map/Reduce  Focus on writes not reads  Scales nicely  Easy to Install Cons  Not simple to work with  Building hierarchical structures  Sorting  Querying ▪ Ad Hoc Queries are bad, Digg still use MySQL for certain queries.  Have to manage secondary indexes, (K/V)  Version 0.5

28 Deployments  Facebook (MAYBE!!) http://www.facebook.comhttp://www.facebook.com  Digg http://www.digg.comhttp://www.digg.com

29 Human Interaction Merge, Propagate, Split ✓ Geometry ✓ Intensity ✓ Path  ROI/ROI Links  Tags ✓ HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements ✓ Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links ✓ Many formats, unknown types  Other N-Dimensional ROI ✓ Hierarchical Structures 

30 Implementation of Google’s BigTables, is a complex implement of a key/value store to represent a table. A sophisticated toolset is required to get the most out of this solutions, for instance Google has created sawzall to query this system. HyperTable has a query language call HQL. sawzall Works by creating a table which has columns linked together called column families, like data will exist in the same column family (Ellipse ROI).

31 Pros  Quick  Handles heterogeneous data well  Different rows can have different columns  Can manage distributed data  Map/Reduce  Scales nicely  Easy to Install Cons  GPL License  Building hierarchical structures  Docs are weak  HQL works for simple queries only  Map/Reduce for other work  limit of 255 column families  Secondary keys

32 Deployments  Rediff http://www.rediff.comhttp://www.rediff.com  Zvents http://www.zvents.com/http://www.zvents.com/

33 Human Interaction Merge, Propagate, Split ✓ Geometry ✓ Intensity ✓ Path  ROI/ROI Links  Tags ✓ HCS Many ROI ✓ Tags on ROI ✓ Tags on Measurement ✓ Tables of Measurements ✓ Externally Generated Tags ✓ ROI/ROI Links, ROI/Image Links ✓ Many formats, unknown types  Other N-Dimensional ROI ✓ Hierarchical Structures 

34  Why do we have an RDMS  We don’t normalise the data  Each import will normalise on: ▪ Image, ObjectiveSettings, LogicalChannel, LightSettings, Detector Settings.  Object Penalty  Difference between normalisation and view


Download ppt "Regions of Interest.  What’s in a ROI?  Use cases  Requirements  Current Storage System  Problems  Alternative Storage."

Similar presentations


Ads by Google