Department of Computer Science & Engineering Abstract:. In our time, the advantage of technology is the biggest thing for current scientific works. One.

Slides:



Advertisements
Similar presentations
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Fast Algorithms For Hierarchical Range Histogram Constructions
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Chapter 4: Trees Part II - AVL Tree
Yicheng Tu, § Shaoping Chen, §¥ and Sagar Pandit § § University of South Florida, Tampa, Florida, USA ¥ Wuhan University of Technology, Wuhan, Hubei, China.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
CMPT 225 Sorting Algorithms Algorithm Analysis: Big O Notation.
Simulation-based Optimization for Region Design in the U.S. Liver Transplantation Network Gabriel Zayas-Cabán, Patricio Rocha, and Dr. Nan Kong Department.
U N I V E R S I T Y O F S O U T H F L O R I D A Computing Distance Histograms Efficiently in Scientific Databases Yicheng Tu, * Shaoping Chen, *§ and Sagar.
Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.
Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
The Memory Hierarchy CPSC 321 Andreas Klappenecker.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
A survey on stream data mining
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Algorithm Analysis and Big Oh Notation Courtesy of Prof. Ajay Gupta (with updates by Dr. Leszek T. Lilien) CS 1120 – Fall 2006 Department of Computer Science.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets Rainer Gemulla (University of Technology Dresden) Wolfgang Lehner (University.
DATA STRUCTURE Subject Code -14B11CI211.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Genetic Programming.
© Janice Regan, CMPT 128, Feb CMPT 128: Introduction to Computing Science for Engineering Students Running Time Big O Notation.
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Abstraction IS 101Y/CMSC 101 Computational Thinking and Design Tuesday, September 17, 2013 Carolyn Seaman University of Maryland, Baltimore County.
Introduction Autostereoscopic displays give great promise as the future of 3D technology. These displays spatially multiplex many views onto a screen,
Lecture No.01 Data Structures Dr. Sohail Aslam
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 17: Recursion.
Department of Computer Science Data Structures Using C++ 2E Chapter 6: Recursion Learn about recursive Definitions Algorithms Functions Explore the base.
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Recursion, Complexity, and Sorting By Andrew Zeng.
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
U N I V E R S I T Y O F S O U T H F L O R I D A Database-centric Data Analysis of Molecular Simulations Yicheng Tu *, Sagar Pandit §, Ivan Dyedov *, and.
Presenter: Mathias Jahnke Authors: M. Zhang, M. Mustafa, F. Schimandl*, and L. Meng Department of Cartography, TU München *Chair of Traffic Engineering.
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
General Writing - Audience What is their level of knowledge? Advanced, intermediate, basic? Hard to start too basic – but have to use the right terminology.
A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
C HU H AI C OLLEGE O F H IGHER E DUCATION D EPARTMENT O F C OMPUTER S CIENCE Preparation of Final Year Project Report Bachelor of Science in Computer Science.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Transforming Policies into Mechanisms with Infokernel Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
1 An Execution-Driven Simulation Tool for Teaching Cache Memories in Introductory Computer Organization Courses Salvador Petit, Noel Tomás Computer Engineering.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Design and Analysis of Algorithms Faculty Name : Ruhi Fatima Course Description This course provides techniques to prove.
1 University of Maryland Using Information About Cache Evictions to Measure the Interactions of Application Data Structures Bryan R. Buck Jeffrey K. Hollingsworth.
SUBJECT : DIGITAL ELECTRONICS CLASS : SEM 3(B) TOPIC : INTRODUCTION OF VHDL.
Discrete ABC Based on Similarity for GCP
Mean Value Analysis of a Database Grid Application
A Level Computing Component 2
Structure learning with deep autoencoders
Simulation Carey Williamson Department of Computer Science
Set-Associative Cache
Closure Representations in Higher-Order Programming Languages
4. Computational Problem Solving
Carey Williamson Department of Computer Science University of Calgary
CMPT 102 Introduction to Scientific Computer Programming
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Approximate Mean Value Analysis of a Database Grid Application
Presentation transcript:

Department of Computer Science & Engineering Abstract:. In our time, the advantage of technology is the biggest thing for current scientific works. One of those works is the simulation of particles. In the project, “Computing Distances Histogram Efficiently in Scientific Databases", they do the transition from the manual labor of these simulations to an automated labor with a computer, and this was a very difficult process. But they automate the process in an innovative and more efficient way with an implemented algorithm. The algorithm's work was to calculate the distance between the particles and trace this distances in a histogram. The only one detail missing, is the rapidness of these calculations. Before this algorithm the code was too slow than what they wanted it to be. For that issue, we propose a new algorithm that shortens the time in which it makes the simulation, time is very valuable for us because the datasets of these simulations are very big and take too much time to analyze. This new algorithm gave us very good results and we can see that the time of processing decreased with the new implemented algorithm. Algorithm:. Background: In the project, one of the biggest inspirations was the commodity of the technology in their works. Their work requires data sets of particle simulations of one large system are obtained. The particle simulation is important because they can simulate large systems into a classical entity like particles. The size of these data sets could be of millions to billions of particles. Also, those require some configuration and statistical calculations to complete their purpose that is to find the distance between particles in these datasets. The configuration stores some important information of the particles and the statistical calculation uses Spatial Distance Histogram (SDH), with this they can trace a histogram that calculates the distance between all pairs of particles on the simulated system. Conclusion:. The idea of the code was very good because we reach our goal, the run time of the code is 85% faster than original one. We need improvements on the code because we need to add more accuracy to the code and run more faster that now. But above all, the result are excellent for our goal. Objectives:. › Create an algorithm in C that shortens the actual run time. › Compare and analyze the difference in power and accuracy for the old and new algorithms and see if it is a good idea shortening time and lacking a bit of accuracy or detail. Results:. For all those reasons, the biggest data cannot be calculated by human means, it requires a most advanced algorithm to calculate all this data in a computer. That is why they created an algorithm called Density Map (DM). That algorithm lets them calculate the data with much accuracy and then, create a histogram. The algorithm is very complicated and consists of complex data structures (Quad trees) and other important things that are required to calculate the data sets and organize the data in the computer memory. DM was the most useful thing for them because they can calculate with accuracy and now they can use incredibly large data sets. Because of the size of the dataset size and the many calculations the execution time of this algorithm is very slow. Astronomical Biological/Chemical – molecular simulation Fig.1 Some examples of astronomical and biological/chemical simulation used for data. n – Smallest size of data Fig. 2 From a big size of data N takes some random particles for a new smallest data set n. N – Big size of data Memory blocks are chosen random for the new array The original array uses random memory blocks Fig. 3 How memory looks. N n ›The algorithm is an small code in their original code, Density Map. In there they use a function which takes the original array of data N and shortens it to a smaller one n. ›The code makes the data set smaller by using a random function to produce a true or false flag and with this flag we decide if the data in that particular place is going to be used or not. ›At the end, the code reassigns a new size to the original array and all the new values are set there. After this the code performs the analysis of the smaller dataset. ›Pseudocode: ›Start with the original array. This is assigned to store the values on it. For each elements in the arrayA copy it to tempArray. Then, free the memory allocation of arrayA. For each elements in the tempArray. Flip a coin randomly. If the value is true or 1 Take the value of tempArray and assign to the newly allocated arrayA. Acknowledgements: To the University of South Florida for give me the opportunity to participate in the REU. But the most special greeting to Dr. Yicheng Tu, Dr. Miguel Labrador and Prof. Daladier Jabba. Also to Universidad Metropolitana for help me to reach that goal : | | | | | 05: | 0 T: CutArray = : | 0 | 0 | 0 | | 05: 0 | 0 T: Fig 4. Output of the results. The results given with the new algorithm reduced time approximately 85% than the original one. Time Original Output New Output 85% improvement Bucket The result was like we expected, that the time of the run was faster than the original one. That is 85% of improvement. We lost some accuracy in the bucket results but this is affordable because of the gain in time. Original CodeWith New Code Power25%85% Accuracy+- Fig 4. Representation of the results. The results given with the new algorithm show that the time reduced approximately 85% than with the original one. The only drawback is that the accuracy was more in the original than with the new code.