HLT, COLT, ICML, Coling-ACL, EMNLP - nobuyukishimizuの日記

HLT, COLT, ICML, Coling-ACL, EMNLPで、機械学習の視点から見て大事そうだな、と思った論文について書きます。　自然言語処理の分野で使えそうな structure のある話にバイアスがかかっているので悪しからず。　他にも良い論文はいっぱいありました。

=======
undirected graphical modelについては、近似を使って undirected graphical model を速く学習する方法についての論文が大事そうでした。　

Quadratic Programming Relaxations for Metric Labeling and Markov Random Field MAP Estimation
http://www.icml2006.org/icml_documents/camera-ready/093_Quadratic_Programmin.pdf

Efficient MAP approximation for dense energy functions
http://www.icml2006.org/icml_documents/camera-ready/069_Efficient_MAP_Approx.pdf

=======
structure がある物体についてのクラスタリングや、unsupervised/semi-supervised learningについての論文も大事そうでした。

Clustering Graphs by Weighted Substructure Mining
http://www.icml2006.org/icml_documents/camera-ready/120_Clustering_Graphs_by.pdf

Discriminative Unsupervised Learning of Structured Predictors
http://www.icml2006.org/icml_documents/camera-ready/133_Discriminative_Unsup.pdf

Semi-Supervised Learning for Structured Output Variables
http://www.icml2006.org/icml_documents/camera-ready/019_Semi_Supervised_Lear.pdf

=======
classification & similarity

一応、分類問題なんですが、feature space にある similarityのコンセプトを使っている所が面白いと思いました。　semi-supervised や、constraint clusteringも出てきていますし、少しづつ、クラスタリングと分類の境界が薄れてきている感じがします。　

On a Theory of Kernels as Similarity Functions
http://www.icml2006.org/icml_documents/camera-ready/010_On_a_Theory_of_Learn.pdf

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction
http://www.icml2006.org/icml_documents/camera-ready/114_Local_Fisher_Discrim.pdf

=======
sequential labeling に特化した論文

少ないデータで学習する系統

Prototypeを使った Unsupervised Learning。　
Prototype-Driven Learning for Sequence Models
http://www.cs.berkeley.edu/~aria42/pubs/naacl06-posinduction.pdf

Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling
http://www.cs.ualberta.ca/~fjiao/acl2006.pdf

Semi-Markov を速くする系統

Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition
http://www-tsujii.is.s.u-tokyo.ac.jp/~hillbig/papers/acl2006_semicrf.pdf

Efficient inference on sequence segmentation　models
http://www.icml2006.org/icml_documents/camera-ready/100_Efficient_Inference.pdf

F-measureが大事

Training Conditional Random Fields with　Multivariate Evaluation Measures
http://acl.ldc.upenn.edu/P/P06/P06-1028.pdf

NER Systems that Suit Users Preferences: Adjusting the
Recall-Precision Trade-off for Entity Extraction
http://www.cs.cmu.edu/~wcohen/postscript/hlt2006.pdf

その他

Segment-based Hidden Markov Models for Information Extraction
http://acl.ldc.upenn.edu/P/P06/P06-1061.pdf

An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition
http://acl.ldc.upenn.edu/P/P06/P06-1141.pdf

=======
Decoding

Strucuted Output Predictionの、Decodingを直して欲しい情報を取り出すための論文です。

Integer Linear Programming 系統
Viterbiのかわりに　ILPを使って、もっとグローバルなConstraintsを入れてもDecoding出来るようにしよう、というやりかたなんですが、今年は結構多いです。

2005年の論文
Integer Linear Programming Inference for Conditional Random Fields
http://l2r.cs.uiuc.edu/~danr/Papers/RothYi05.pdf

ここから2006年の論文で、係受け解析に使う話。
Incremental Integer Linear Programming for Non-projective Dependency Parsing
http://sebrie.freehostia.com/cms/publications/emnlp06-riedel-clarke.fix.pdf

機械翻訳で必要になる、Word Alignmentの方に持ってくる話。
Word Alignment via Quadratic Assignment
http://www.cs.berkeley.edu/~taskar/pubs/naacl06_qap.pdf

ILPよりも、Finite State Machineを使って、もっと速くやろうという話。
A fast finite-state relaxation method for enforcing global constraints on sequence decoding
http://nlp.cs.jhu.edu/~royt/hlt-naacl-2006.pdf

ILPとfinite-state系統以外のdecodingを含めた学習方は　Hal Daume III の A* を使ったものや、searnがあります。　賢く検索できるように学習すれば、速くできる。　したがって、ややこしい search space でも探せるという感じでしょうか。

少し毛色を変えて、sequenceの途中でもdecodeする話。
Online Decoding of Markov Models with Latency Constraints
http://www.icml2006.org/icml_documents/camera-ready/083_Online_Decoding_of_M.pdf

=======
その他、ICMLで少し変わっていて面白いと思った論文集

Algorithms for Portfolio Management based on the Newton Method
http://www.icml2006.org/icml_documents/camera-ready/002_Algorithms_for_Portf.pdf

The Relationship Between Precision-Recall and ROC Curves
http://www.icml2006.org/icml_documents/camera-ready/030_The_Relationship_Bet.pdf

Nightmare at test time: Robust learning by feature deletion
http://www.icml2006.org/icml_documents/camera-ready/045_Nightmare_at_Test_Ti.pdf

Maximum Margin Planning
http://www.ri.cmu.edu/pubs/pub_5405.html

=======
つぎは、convexity vs non-convexityについて。

学習する際に、convexityは必要ない、と主張している論文があります。　プレゼンテーションで、hinge loss では、decision boudaryのそばにあるsampleだけでなく、すごく遠くの方にあるoutlayerみたいなsampleもsupport vector になってしまう。それだと逆に accuracy が落ちてしまうから、convexityは忘れて、global minima より、accuracyが高くなるlocal minimaに行くべきだ、と言っていました。
Trading Convexity for Scalability
http://www.icml2006.org/icml_documents/camera-ready/026_Trading_Convexity_fo.pdf

また、Predictive Uncertainty in Environmental Modelling Competition
http://theoval.cmp.uea.ac.uk/~gcc/competition/
に勝ったアルゴリズムには local minimaがあります。

Variable noise and dimensionality reduction for sparse Gaussian processes
http://www.gatsby.ucl.ac.uk/~snelson/snelson_uai.pdf

元のalgorithmがconvexでも、semi-supervised や unsupervisedにしたり、hidden variableをつけるとconvexityが失われることが多いので、なんというか、ある種の良い non-convexityというものがあるような気がします。

=======
最適化の方法。　

スタンダードな教科書は多分、
Numerical Optimization (Nocedal & Wright)
http://www.ece.northwestern.edu/~nocedal/book/num-opt.html

SMD (Stochastic Meta Decent)
特に最近はやっているように見えます。　

http://users.rsise.anu.edu.au/~nici/pubs/mvp.pdf
http://users.rsise.anu.edu.au/~nici/bib2html/b2hd-SMD_60min.html

SVMを速くする (JMLR)
http://users.rsise.anu.edu.au/~nici/pubs/SVMDjmlr.pdf

強化学習を速くする (NIPS05)
http://users.rsise.anu.edu.au/~nici/pubs/nips05.pdf

Conditional Random Fieldsを速くする (ICML06)
http://users.rsise.anu.edu.au/~nici/pubs/crfsmd.pdf

Video Tutorial
http://seminars.ijs.si/pascal/2005/mlss%5Fcanberra/
http://seminars.ijs.si/pascal/2006/mlss06%5Fcanberra/

Hidden Variableを入れたり、Semi-supervised にしてUnlabeled Sample を使うとObjectiveがConvexでなくなってしまうことが多いんですが、そのときにどうするか、という話です。　

Continuation Methods
http://www.icml2006.org/icml_documents/camera-ready/024_A_Continuation_Metho.pdf

それから、上でも出てきたTrading Convexityの論文の、Concave-Convex Procedureがあります。

=======
Directed Graphical Modelの系統

Indian Buffet Processを使った論文
このChoice Modelでは、MCMCのやり方できちんとExchangabilityを使えば、サンプリングがもっと正しくできる、という話が出ていました。
http://www.icml2006.org/icml_documents/camera-ready/046_A_Choice_Model_with.pdf

パチンコ Allocation
http://www.cs.umass.edu/~mccallum/papers/pam-icml06.pdf

Clustering Documents with an Exponential-Family Approximation of the Dirichlet Compound Multinomial Distribution
http://www.icml2006.org/icml_documents/camera-ready/037_Clustering_Documents.pdf

Topic Modeling: Beyond Bag-of-Words
http://www.icml2006.org/icml_documents/camera-ready/123_Topic_Modeling_Beyon.pdf

Language Modelですが、Kneser-Ney という、多分一番効果的な smoothing methodを Graphical Modelと関連づけるすごい論文です。
A Hierarchical Bayesian Language Model based on Pitman-Yor Processes
http://acl.ldc.upenn.edu/P/P06/P06-1124.pdf

ここからは単純な Topic Model以外の Graphical Modelです。

次は機械翻訳に出てくる問題の、Word Alignmentを、言語A -> Bへのモデルと、B -> A へのモデルをjoint graphical modelで統合して、Unsupervisedで発見する方法です。　綺麗です。
Alignment by Agreement
http://www.cs.berkeley.edu/~pliang/papers/alignment-naacl2006.pdf
http://www.cs.berkeley.edu/~pliang/papers/alignment-naacl2006-talk.pdf

Segmentation & LabelingをUnsupervised Graphical Modelでやる話です。
Unsupervised Topic Modelling for Multi-Party Spoken Discourse
http://acl.ldc.upenn.edu/P/P06/P06-1003.pdf

Bayesian Query-Focused Summarization
http://acl.ldc.upenn.edu/P/P06/P06-1039.pdf

Contextual Dependencies in Unsupervised Word Segmentation
http://acl.ldc.upenn.edu/P/P06/P06-1085.pdf

=======
機械翻訳と係り受け解析で面白いと思った論文。

Prototype-Driven Grammar Induction
http://www.cs.berkeley.edu/~aria42/pubs/acl06-grammarinduction

A Discriminative Global Training Algorithm for Statistical MT
http://acl.ldc.upenn.edu/P/P06/P06-1091.pdf

An End-to-End Discriminative Approach to Machine Translation
http://acl.ldc.upenn.edu/P/P06/P06-1096.pdf

Left-to-Right Target Generation for Hierarchical Phrase-based Translation
http://acl.ldc.upenn.edu/P/P06/P06-1098.pdf

Advances in Discriminative Parsing
http://acl.ldc.upenn.edu/P/P06/P06-1110.pdf