实习总结（Internship Summary)

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Online Social Networks and Media

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

实习总结（Internship Summary) 赵路达

项目(Projects) LegoNet Gradient Checker Parametric Linear Rectifier Unit(ReLU/PreLU) Implementation on LegoNet + Experiments on 100y click-Data dataset LegoClassifyNet Implementation + Experiments on MNIST dataset LegoNet Visualizer

Gradient Checker

Background LegoNet DNN Framework Forward Feed calculates scores from first Layer to last Back-propagation updates gradients from last Layer to first However, backpropagation code can be tricky to get right

𝜹 𝜹 𝒙 𝒊 𝒇 𝒙 ≈ 𝒇 𝒙+(𝜺 ∗ 𝒆 𝒊 ) − 𝒇 𝒙 −(𝜺 ∗ 𝒆 𝒊 ) 𝟐∗𝜺 , for very small 𝜺 We can use a numerical method using the definition of a gradient to double-check the result: 𝜹 𝜹 𝒙 𝒊 𝒇 𝒙 ≈ 𝒇 𝒙+(𝜺 ∗ 𝒆 𝒊 ) − 𝒇 𝒙 −(𝜺 ∗ 𝒆 𝒊 ) 𝟐∗𝜺 , for very small 𝜺 In words: we perturb each input/parameter by a small 𝜺 and check how much the result shifts relative to 𝜺

Prints Report Outputs vectors Norm function Yes! No Numerical Gradients Verification Gradients from backprop Inputs Parameters

Configurable Testing Each tested Layer, with tolerance, input ranges, and other parameters are listed in prototxt format

Detailed Analysis One script will automatically run all tests and prints out detailed results in order to aid debugging

Parametric Linear Rectifier Unit(PReLU), Implementation + Experimentation

Motivation + Hypothesis ReLU Activation Units widely used in Deep Learning due to desirable non-linearity properties PReLU an improvement over ReLU by providing train-able parameter to adjust non-linearity Has shown significant results(ImageNet) Question: NLP?

Implementation LegoNet: modular design, relative easy to add new Layer classes ReLU Layer Feed-Forward: Backpropagation: Inputs: PReLU Layer Feed-Forward: Backpropagation: PReLU params: Inputs:

Experiments Context: 2-hidden layer Simnet DNN used for similarity rankings between query-title pairs Baseline: 2-hidden layer w/ softsign acti. function

Goals Compare ReLU, PReLU speed + accuracy improvement to baseline softsign acti. function Investigate effect of PReLU parameter a’s learning rate Test effectiveness of PReLU non-linear initialization proposed in paper vs. current default initialization(Xavier initialization) Investigate network structures using ReLU with possible sparse output representations

ReLU, PReLU, compared to baseline Conclusion: ReLU Layer Networks obtained worse results than baseline, Similar results for PReLU Networks, but more works are needed Future Directions: More in-depth comparisons, including more tuning of PReLU Networks

Initialization Comparison Conclusion: PReLU non- linear initialization performed worse than default init. Possible Explanation: init. proposed for extremely deep CNNs used in image processing, may not be applicable here Future Direction: Investigate other type of initializations

PReLU learning rates Still running…

Sparse Outputs Network with ReLU Conclusion: unbalanced structures works significantly better than balanced versions, but still suffers an accuracy penalty compared to baseline Future Direction: further testing with ReLU additions to network. Perhaps LR is too low?

Experimentation is hard! First experience with doing research + experimentation on large-scale dataset Many challenges: debugging difficulties, lack of exp. in multi-thread, accidentially rm-ing directories… However, many learnings: working with big datasets, how to devise good experiments, lots and lots of shell scripts…etc. etc.

LegoClassifyNet: Experiments with MNIST

MNIST Dataset open-source dataset of handwritten digits from 0-9, widely used as benchmark in Deep Learning Small, easy and fast to train & debug Investigated PReLU/ReLU effectiveness in classification task Achieved over 98% testing set accuracy with 2-hidden layer NNs with PReLU units, matching most publicly published results

MNIST Experiments, #1 Conclusion: ReLU/PReLU show improvement over other non-linear functions in MNIST classification task, contrary to click-data experiments PReLU convergence rate slightly faster, with similar result on 2-hidden layer NN

MNIST Experiments, #2 Conclusion: bigger PReLU param learning rates leads to faster convergence on MNIST dataset Future direction: more investigations of param a’s effect on NN learning rate + accuracy

MNIST Experiments, #3 Conclusion: PReLU non-linear init. No significant effect compared to baseline. This matches result on click-data experiments

MNIST Experiments, #4 Conclusion: additional hidden layers seem to improve accuracy, but result is not significant Future direction: testing even deeper NNs with other structures

MNIST Experiments, #5 Conclusion: a values increase from 1st PReLU Layer forward Corresponds to steep non-linearity in the first layer, followed by strictly decreasing non-linearity in the following layers Future direction: More investigation into a values in various contexts

LegoClassifyNet Generalized code used for MNIST classification to meet further needs for classification on LegoNet Implemented LegoClassifyNet, LegoClassifyTestNet classes Implemented new classify.cpp tool Wrote wiki tutorial for working with MNIST with this framework, designed for first-time users

Network Visualizer

Graph Visualization Complex LegoNet configuration files – prototxts – need visualization Converts LegoNet prototxt format into renderable .dot texts Built in pure JS – directly embeddable into any webpage Utilizes open-source JS parsing + rendering libraries Viz.js pbparser.js Google Image API(first version)

Output

Luda 在百度

感谢指导人：董大祥 LegoNet 小组整个 NLP-SC 团队