- 세부 1 - 이종 클라우드 플랫폼 데이터 관리 브로커 연구 및 개발 Network and Computing Lab.

연구 목표 모바일 클라우드 메타데이터 정의 기법 모바일 클라우드 메타 데이터 기반 자원 관리 및 마이그레이션 기 법 이종 클라우드 인프라 성능을 고려한 자원 및 서비스 프로파일링 기법을 이용한 서비스 성능 향상 및 사용자 SLA 보장 기법 서비스 실행에 필요한 데이터들을 캐싱하여 서비스 속도 및 성능 향상을 위한 데이터 프로비저닝 및 빅데이터 처리를 위한 실시간 데이터 공급 기법 연구 데이터 사용 특성 기반 적합 데이터 콘솔리데이션 및 프로비저닝 기법 연구 빅데이터 처리를 위한 분산 및 이종 데이터의 통합 관리를 위한 데 이터 가상화 기법 연구

모바일 클라우드 메타 데이터 기반 자원 관 리 및 마이그레이션 기법

서비스 및 어플리케이션 프로파일링 클라우드 메타데이터 – 서비스 및 어플리케이션 프로파일 – 자원 프로파일 서비스 및 어플리케이션 프로파일링 –Basic approach The expected execution time profiling by VM types (historical data) Application performance by resource usage profiling –Advanced approach Considering resource contention analysis among applications

Today’s topic Classification scheme [Zhuravlev et al., SIGARCH, 2010] –Classification scheme is for identifying which applications should and should not be scheduled together. –Classification scheme enables the scheduler to predict the performance effects of co-scheduling any group of threads in a shared cache –VM placement & allocation algorithm consists of two components: Classification scheme and The policy

Classification Scheme 1) Stack Distance Competition (SDC) [Chandra et al., HPCA, 2005] (1) Assumption) L2 Cache LRU Replacement Stack Distance Profile –Capturing the temporal Reuse behavior of an application in a fully- or set-associative cache Basic Prediction Approach –For smaller cache

Classification Scheme 1) Stack Distance Competition (SDC) (2) Objective –How two applications compete for the LRU stack positions in the shared cache and estimate the extra misses incurred by each application as a result of this contention Main idea –Constructing a new stack distance profile that merges individual stack distance profiles of threads that run together

Classification Scheme 1) Stack Distance Competition (SDC) (3) SDC Algorithm 1)Each individual profile is assigned a current pointer that is initialized to point to the first stack distance position 2)The algorithm iterates A times over each position in the profile, determining which of the co-runners will be the “winner” for this stack-distance position 3)After Ath iteration, the effective cache space for each thread is computed proportionally to the number of its stack distance counters that are included in the merged profile  The cache miss rate with the new effective cache space is estimated for each co-runner

Classification Scheme 2) Animal Classes [Xie et al., CMP-MSI, 2008] (1) This classification scheme allows classifying applications in terms of their influence on each other when co-scheduled in the same cache Four application classes –Turtle (low use of the shared cache) –Sheep (low miss rate, insensitive to the number of cache ways allocated to it) –Rabbit (low miss rate, sensitive to the number of allocated cache ways) –devil (high miss rate, access the L2 cache very quickly)

Classification Scheme 2) Animal Classes (2) Application Classification Algorithm Symbiosis table –To approximate relative performance degradations for applications that fall within different animal classes –Providing estimates of how well various classes co-exist with each other on the same shared cache This scheme uses stack distance profiles

Classification Scheme 3) Miss rate [Zhuravlev et al., SIGARCH, 2010] [Knauerhase et al., IEEE Micro, 2008] (1) Identifying applications with high miss rates is very beneficial for the scheduler because these applications exacerbate the performance degradation due to memory controller contention, memory bus contention, and prefetching hardware contention To attempt an approximation of the best schedule using the miss rate heuristic, the scheduler will identify high miss rate applications and separate them into different caches, such that no one cache will have a much higher total miss rate than any other cache

Classification Scheme 4) Pain [Zhuravlev et al., SIGARCH, 2010] (1) Cache sensitivity –A measure of how much an application will suffer when cache space is taken away from it due to contention –This can be calculated by first, examining the number of cache hits that will most likely turn into misses when the cache is shared second, assigning to positions in the stack distance profile loss probabilities describing the likelihood that the hits will be lost from each position Loss probability distribution is “i / (n+1)” in this paper –Cache sensitivity formula h(i) is the number of hits to the i-th position in the stack, where i=1 is the MRU and i=n is the LRU for an n-way set associative cache

Classification Scheme 4) Pain (2) Cache intensity –A measure of how much an application will hurt others by taking away their space in a shared cache –Measured using the number of last-level cache accesses per one million instructions The Pain metric –The resulting pain is measured by combining cache sensitivity and intensity

Classification Schemes Evaluation [Zhuravlev et al., SIGARCH, 2010]

데이터 사용 특성 기반 적합 데이터 콘솔리 데이션 및 프로비저닝 기법 연구

Today’s topic Data placement & VM placement in Big data processing –Importance of data placement Input data-intensive workloads such as Map Centralized file system vs Distributed file system –Importance of VM placement Intermediate data-intensive workloads such as Reduce Performance issue such as SLA and resource contention

Data placement & VM placement: Purlieus [Palanisamy et al., SC, 2011] (1) Job classification –Map-input heavy jobs (Input data-intensive workloads) –Reduce-input heavy jobs (Intermediate data- intensive workloads) –Map-and-Reduce-input heavy jobs (Input data-and-Intermediate data-intensive workloads)

Data placement & VM placement: Purlieus (2) Map-input heavy jobs (Input data-intensive workloads) –Input data placement Choosing physical machines only based on the storage utilization and the expected load –VM placement Data locality Choosing physical machines which have the corresponding data

Data placement & VM placement: Purlieus (3) Reduce-input heavy jobs (Intermediate data-intensive workloads) –Input data placement Choosing physical machines with maximum free storage –VM placement Choosing physical machines which are close each other Map-Input-Reduce-input heavy jobs (Intermediate data-intensive workloads) –Considering both

Delay scheduling [Zaharia et al., Eurosys, 2010] If we cannot find the appropriate node which has data for first job in a job queue, delaying the job to find the appropriate node until the certain period. Data locality In streaming situation Processing PM job queue delay!

- 세부 1 - 이종 클라우드 플랫폼 데이터 관리 브로커 연구 및 개발 Network and Computing Lab.

Similar presentations

Presentation on theme: "- 세부 1 - 이종 클라우드 플랫폼 데이터 관리 브로커 연구 및 개발 Network and Computing Lab."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

- 세부 1 - 이종 클라우드 플랫폼 데이터 관리 브로커 연구 및 개발 Network and Computing Lab.

Similar presentations

Presentation on theme: "- 세부 1 - 이종 클라우드 플랫폼 데이터 관리 브로커 연구 및 개발 Network and Computing Lab."— Presentation transcript:

Similar presentations

About project

Feedback