实习总结 (Internship Summary)

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Online Social Networks and Media
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

实习总结 (Internship Summary) 赵路达

项目(Projects) LegoNet Gradient Checker Parametric Linear Rectifier Unit(ReLU/PreLU) Implementation on LegoNet + Experiments on 100y click-Data dataset LegoClassifyNet Implementation + Experiments on MNIST dataset LegoNet Visualizer

Gradient Checker

Background LegoNet DNN Framework Forward Feed calculates scores from first Layer to last Back-propagation updates gradients from last Layer to first However, backpropagation code can be tricky to get right

𝜹 𝜹 𝒙 𝒊 𝒇 𝒙 ≈ 𝒇 𝒙+(𝜺 ∗ 𝒆 𝒊 ) − 𝒇 𝒙 −(𝜺 ∗ 𝒆 𝒊 ) 𝟐∗𝜺 , for very small 𝜺 We can use a numerical method using the definition of a gradient to double-check the result: 𝜹 𝜹 𝒙 𝒊 𝒇 𝒙 ≈ 𝒇 𝒙+(𝜺 ∗ 𝒆 𝒊 ) − 𝒇 𝒙 −(𝜺 ∗ 𝒆 𝒊 ) 𝟐∗𝜺 , for very small 𝜺 In words: we perturb each input/parameter by a small 𝜺 and check how much the result shifts relative to 𝜺

Prints Report Outputs vectors Norm function Yes! No Numerical Gradients Verification Gradients from backprop Inputs Parameters

Configurable Testing Each tested Layer, with tolerance, input ranges, and other parameters are listed in prototxt format

Detailed Analysis One script will automatically run all tests and prints out detailed results in order to aid debugging

Parametric Linear Rectifier Unit(PReLU), Implementation + Experimentation

Motivation + Hypothesis ReLU Activation Units widely used in Deep Learning due to desirable non-linearity properties PReLU an improvement over ReLU by providing train-able parameter to adjust non-linearity Has shown significant results(ImageNet) Question: NLP?

Implementation LegoNet: modular design, relative easy to add new Layer classes ReLU Layer Feed-Forward: Backpropagation: Inputs: PReLU Layer Feed-Forward: Backpropagation: PReLU params: Inputs:

Experiments Context: 2-hidden layer Simnet DNN used for similarity rankings between query-title pairs Baseline: 2-hidden layer w/ softsign acti. function

Goals Compare ReLU, PReLU speed + accuracy improvement to baseline softsign acti. function Investigate effect of PReLU parameter a’s learning rate Test effectiveness of PReLU non-linear initialization proposed in paper vs. current default initialization(Xavier initialization) Investigate network structures using ReLU with possible sparse output representations

ReLU, PReLU, compared to baseline Conclusion: ReLU Layer Networks obtained worse results than baseline, Similar results for PReLU Networks, but more works are needed Future Directions: More in-depth comparisons, including more tuning of PReLU Networks

Initialization Comparison Conclusion: PReLU non- linear initialization performed worse than default init. Possible Explanation: init. proposed for extremely deep CNNs used in image processing, may not be applicable here Future Direction: Investigate other type of initializations

PReLU learning rates Still running…

Sparse Outputs Network with ReLU Conclusion: unbalanced structures works significantly better than balanced versions, but still suffers an accuracy penalty compared to baseline Future Direction: further testing with ReLU additions to network. Perhaps LR is too low?

Experimentation is hard! First experience with doing research + experimentation on large-scale dataset Many challenges: debugging difficulties, lack of exp. in multi-thread, accidentially rm-ing directories… However, many learnings: working with big datasets, how to devise good experiments, lots and lots of shell scripts…etc. etc.

LegoClassifyNet: Experiments with MNIST

MNIST Dataset open-source dataset of handwritten digits from 0-9, widely used as benchmark in Deep Learning Small, easy and fast to train & debug Investigated PReLU/ReLU effectiveness in classification task Achieved over 98% testing set accuracy with 2-hidden layer NNs with PReLU units, matching most publicly published results

MNIST Experiments, #1 Conclusion: ReLU/PReLU show improvement over other non-linear functions in MNIST classification task, contrary to click-data experiments PReLU convergence rate slightly faster, with similar result on 2-hidden layer NN

MNIST Experiments, #2 Conclusion: bigger PReLU param learning rates leads to faster convergence on MNIST dataset Future direction: more investigations of param a’s effect on NN learning rate + accuracy

MNIST Experiments, #3 Conclusion: PReLU non-linear init. No significant effect compared to baseline. This matches result on click-data experiments

MNIST Experiments, #4 Conclusion: additional hidden layers seem to improve accuracy, but result is not significant Future direction: testing even deeper NNs with other structures

MNIST Experiments, #5 Conclusion: a values increase from 1st PReLU Layer forward Corresponds to steep non-linearity in the first layer, followed by strictly decreasing non-linearity in the following layers Future direction: More investigation into a values in various contexts

LegoClassifyNet Generalized code used for MNIST classification to meet further needs for classification on LegoNet Implemented LegoClassifyNet, LegoClassifyTestNet classes Implemented new classify.cpp tool Wrote wiki tutorial for working with MNIST with this framework, designed for first-time users

Network Visualizer

Graph Visualization Complex LegoNet configuration files – prototxts – need visualization Converts LegoNet prototxt format into renderable .dot texts Built in pure JS – directly embeddable into any webpage Utilizes open-source JS parsing + rendering libraries Viz.js pbparser.js Google Image API(first version)

Output

Luda 在百度

感谢 指导人: 董大祥 LegoNet 小组 整个 NLP-SC 团队