Crawl and Visualize ICLR 2020 OpenReview Data
Descriptions
This Jupyter Notebook contains the data crawled from ICLR 2020 OpenReview webpages and their visualizations. The list of submissions (sorted by the average ratings) can be found here .
Prerequisites
Visualizations
Rating distribution
The distribution of reviewer ratings centers around 4 (mean: 4.1822).
The cumulative sum of reviewer ratings.
You can compute how many papers are beaten by yours with
# See how many papers are beaten by yours
def PR (rating_mean , your_rating ):
pr = np.sum(your_rating > np.array(rating_mean))/ len (rating_mean)* 100
same_rating = np.sum(your_rating == np.array(rating_mean))/ len (rating_mean)* 100
return pr, same_rating
my_rating = (6 + 6 + 6 )/ 3 . # your average rating here
pr, same_rating = PR(rating_mean, my_rating)
print (' Your papar ({:.2f } ) is among the top {:.2f } % o f submissions based on the ratings.\n '
' There are {:.2f } % with the same rating.' .format(
my_rating, 100 - pr, same_rating))
# accept rate orals posters
# ICLR 2017: 39.1% (198/507) 15 183
# ICLR 2018: 32.0% (314/981) 23 291
# ICLR 2019: 31.4% (500/1591) 24 476
# ICLR 2020: ? (?/2594)
[Output]
Your papar (6.00) is among the top 21.67% of submissions based on the ratings.
There are 8.12% with the same rating.
Word clouds
The word clouds formed by keywords of submissions show the hot topics including deep learning , reinforcement learning , representation learning , generative models , graph neural network , etc.
This figure is plotted with python word cloud generator
from wordcloud import WordCloud
wordcloud = WordCloud(max_font_size = 64 , max_words = 160 ,
width = 1280 , height = 640 ,
background_color = " black" ).generate(' ' .join(keywords))
plt.figure(figsize = (16 , 8 ))
plt.imshow(wordcloud, interpolation = " bilinear" )
plt.axis(" off" )
plt.show()
Frequent keywords
The top 50 common keywords and their frequency.
The average reviewer ratings and the frequency of keywords indicate that to maximize your chance to get higher ratings would be using the keywords such as compositionality , deep learning theory , or gradient descent .
Review length histogram
The average review length is 407.91 words. The histogram is as follows.
Reviewer rating change during the rebuttal period
All individual ratings:
The average rating for each paper:
How it works
See How to install Selenium and ChromeDriver on Ubuntu .
To crawl data from dynamic websites such as OpenReview, a headless web simulator can be created by
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
executable_path = ' /Users/waltersun/Desktop/chromedriver' # path to your executable browser
options = Options()
options.add_argument(" --headless" )
browser = webdriver.Chrome(options = options, executable_path = executable_path)
Then, we can get the content from a webpage
To know what content we to crawl, we need to inspect the webpage layout.
I chose to get the content by
key = browser.find_elements_by_class_name(" note_content_field" )
value = browser.find_elements_by_class_name(" note_content_value" )
The data includes the abstract, keywords, TL; DR, comments.
Installing Selenium and ChromeDriver on Ubuntu
The following content is hugely borrowed from a nice post written by Christopher Su.
Install Google Chrome for Debian/Ubuntu
sudo apt-get install libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome*.deb
sudo apt-get install -f
Install xvfb
to run Chrome on a headless device
sudo apt-get install xvfb
Install ChromeDriver for 64-bit Linux
sudo apt-get install unzip # If you don't have unzip package
wget -N http://chromedriver.storage.googleapis.com/2.26/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver
sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
If your system is 32-bit, please find the ChromeDriver releases here and modify the above download command.
Install Python dependencies (Selenium and pyvirtualdisplay)
pip install pyvirtualdisplay selenium
Test your setup in Python
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible = 0 , size = (1024 , 1024 ))
display.start()
browser = webdriver.Chrome()
browser.get(' http://shaohua0116.github.io/' )
print (browser.title)
print (browser.find_element_by_class_name(' bio' ).text)
All ICLR 2020 OpenReview data
Collected at 12/05/2019 12:05:51 AM
Number of submissions: 2594 (withdrawn/desk reject submissions: 367)
Rank
Average Rating
Title
Old Ratings
Ratings
Variance
1
8.00
Differentiation Of Blackbox Combinatorial Solvers
6, 8, 6
8, 8, 8
0.00
2
8.00
Enhancing Adversarial Defense By K-winners-take-all
8, 6, 8
8, 8, 8
0.00
3
8.00
Freelb: Enhanced Adversarial Training For Language Understanding
8, 8
8, 8
0.00
4
8.00
Geometric Analysis Of Nonconvex Optimization Landscapes For Overcomplete Learning
6, 8, 8
8, 8, 8
0.00
5
8.00
Why Gradient Clipping Accelerates Training: A Theoretical Justification For Adaptivity
6, 6, 8
8, 8, 8
0.00
6
8.00
Cater: A Diagnostic Dataset For Compositional Actions & Temporal Reasoning
8, 8, 8
8, 8, 8
0.00
7
8.00
Data-dependent Gaussian Prior Objective For Language Generation
6, 8, 6
8, 8, 8
0.00
8
8.00
Implementation Matters In Deep Rl: A Case Study On Ppo And Trpo
6, 8, 6
8, 8, 8
0.00
9
8.00
Simplified Action Decoder For Deep Multi-agent Reinforcement Learning
8, 3, 3
8, 8, 8
0.00
10
8.00
On The "steerability" Of Generative Adversarial Networks
8, 8, 8
8, 8, 8
0.00
11
8.00
Understanding And Robustifying Differentiable Architecture Search
8, 8, 6
8, 8, 8
0.00
12
8.00
Gendice: Generalized Offline Estimation Of Stationary Values
8, 8, 8
8, 8, 8
0.00
13
8.00
An Algorithm-agnostic Nas Benchmark
8, 6, 8
8, 8, 8
0.00
14
8.00
Causal Discovery With Reinforcement Learning
6, 6, 6
8, 8, 8
0.00
15
8.00
Restricting The Flow: Information Bottlenecks For Attribution
8, 6, 8
8, 8, 8
0.00
16
8.00
Sparse Coding With Gated Learned Ista
8, 8, 3
8, 8, 8
0.00
17
8.00
Hyper-sagnn: A Self-attention Based Graph Neural Network For Hypergraphs
6, 6
8, 8
0.00
18
8.00
Smooth Markets: A Basic Mechanism For Organizing Gradient-based Learners
8, 8
8, 8
0.00
19
8.00
The Logical Expressiveness Of Graph Neural Networks
6, 8, 8
8, 8, 8
0.00
20
8.00
Learning To Balance: Bayesian Meta-learning For Imbalanced And Out-of-distribution Tasks
8, 8, 6
8, 8, 8
0.00
21
8.00
Differentiable Reasoning Over A Virtual Knowledge Base
6, 8, 8
8, 8, 8
0.00
22
8.00
Meta-learning With Warped Gradient Descent
6, 8, 8
8, 8, 8
0.00
23
8.00
How Much Position Information Do Convolutional Neural Networks Encode?
8, 6, 8
8, 8, 8
0.00
24
8.00
A Generalized Training Approach For Multiagent Learning
3, 6, 6
8, 8, 8
0.00
25
8.00
Depth-width Trade-offs For Relu Networks Via Sharkovsky's Theorem
8, 8
8, 8
0.00
26
8.00
Contrastive Learning Of Structured World Models
6, 6, 8
8, 8, 8
0.00
27
8.00
Mathematical Reasoning In Latent Space
8, 6, 6
8, 8, 8
0.00
28
8.00
Rotation-invariant Clustering Of Functional Cell Types In Primary Visual Cortex
8, 3, 6
8, 8, 8
0.00
29
8.00
Mirror-generative Neural Machine Translation
8, 8, 6
8, 8, 8
0.00
30
8.00
A Theory Of Usable Information Under Computational Constraints
8, 8
8, 8
0.00
31
8.00
Principled Weight Initialization For Hypernetworks
8, 6, 6
8, 8, 8
0.00
32
8.00
Optimal Strategies Against Generative Attacks
6, 8, 8, 8
8, 8, 8, 8
0.00
33
8.00
Backpack: Packing More Into Backprop
8, 8, 8
8, 8, 8
0.00
34
8.00
Dynamics-aware Unsupervised Skill Discovery
8, 8, 8
8, 8, 8
0.00
35
7.50
Vq-wav2vec: Self-supervised Learning Of Discrete Speech Representations
8, 6, 8, 6
8, 6, 8, 8
0.87
36
7.50
Rna Secondary Structure Prediction By Learning Unrolled Algorithms
8, 8, 6
8, 8, 8, 6
0.87
37
7.33
Your Classifier Is Secretly An Energy Based Model And You Should Treat It Like One
3, 8, 8
6, 8, 8
0.94
38
7.33
Measuring The Reliability Of Reinforcement Learning Algorithms
6, 6, 6
8, 8, 6
0.94
39
7.33
Deep Batch Active Learning By Diverse, Uncertain Gradient Lower Bounds
8, 3, 8
8, 6, 8
0.94
40
7.33
Federated Learning With Matched Averaging
6, 6
6, 8, 8
0.94
41
7.33
Comparing Fine-tuning And Rewinding In Neural Network Pruning
6, 6, 3
8, 6, 8
0.94
42
7.33
Symplectic Ode-net: Learning Hamiltonian Dynamics With Control
3, 8, 6
6, 8, 8
0.94
43
7.33
On The Convergence Of Fedavg On Non-iid Data
6, 8, 8
6, 8, 8
0.94
44
7.33
Mixed-curvature Variational Autoencoders
6, 6, 8
6, 8, 8
0.94
45
7.33
Robust Subspace Recovery Layer For Unsupervised Anomaly Detection
3, 8, 6
6, 8, 8
0.94
46
7.33
Polylogarithmic Width Suffices For Gradient Descent To Achieve Arbitrarily Small Test Error With Shallow Relu Networks
8, 3, 8
8, 6, 8
0.94
47
7.33
A Closer Look At Deep Policy Gradients
6, 6, 8
8, 6, 8
0.94
48
7.33
Deep Network Classification By Scattering And Homotopy Dictionary Learning
8, 3, 6
8, 8, 6
0.94
49
7.33
Deep Imitative Models For Flexible Inference, Planning, And Control
6, 6, 6
8, 6, 8
0.94
50
7.33
Classification-based Anomaly Detection For General Data
8, 8, 3
8, 8, 6
0.94
51
7.33
Deep Learning For Symbolic Mathematics
6, 8, 6
8, 8, 6
0.94
52
7.33
Ted: A Pretrained Unsupervised Summarization Model With Theme Modeling And Denoising
6, 6, 3
6, 8, 8
0.94
53
7.33
Fasterseg: Searching For Faster Real-time Semantic Segmentation
3, 8, 3
6, 8, 8
0.94
54
7.33
Latent Normalizing Flows For Many-to-many Cross Domain Mappings
6, 6, 8
6, 8, 8
0.94
55
7.33
At Stability's Edge: How To Adjust Hyperparameters To Preserve Minima Selection In Asynchronous Training Of Neural Networks?
6, 8
8, 6, 8
0.94
56
7.33
Albert: A Lite Bert For Self-supervised Learning Of Language Representations
8, 8, 6
8, 8, 6
0.94
57
7.33
Finite Depth And Width Corrections To The Neural Tangent Kernel
3, 8, 8
6, 8, 8
0.94
58
7.33
On Mutual Information Maximization For Representation Learning
8, 8, 6
8, 8, 6
0.94
59
7.33
Compressive Transformers For Long-range Sequence Modelling
3, 6, 6
6, 8, 8
0.94
60
7.33
Truth Or Backpropaganda? An Empirical Investigation Of Deep Learning Theory
6, 6, 8
8, 6, 8
0.94
61
7.33
What Graph Neural Networks Cannot Learn: Depth Vs Width
6, 1, 8
8, 6, 8
0.94
62
7.33
Sumo: Unbiased Estimation Of Log Marginal Probability For Latent Variable Models
6, 8, 8
6, 8, 8
0.94
63
7.33
Directional Message Passing For Molecular Graphs
6, 8, 8
6, 8, 8
0.94
64
7.33
Learning Robust Representations Via Multi-view Information Bottleneck
6, 8, 8
6, 8, 8
0.94
65
7.33
Scaling Autoregressive Video Models
6, 8, 8
6, 8, 8
0.94
66
7.33
Massively Multilingual Sparse Word Representations
6, 8, 8
6, 8, 8
0.94
67
7.33
Mogrifier Lstm
3, 8, 8
6, 8, 8
0.94
68
7.33
Ddsp: Differentiable Digital Signal Processing
8, 6, 8
8, 6, 8
0.94
69
7.33
Low-resource Knowledge-grounded Dialogue Generation
3, 8, 8
6, 8, 8
0.94
70
7.33
Reconstructing Continuous Distributions Of 3d Protein Structure From Cryo-em Images
6, 8, 8
6, 8, 8
0.94
71
7.33
Network Deconvolution
6, 8, 8
6, 8, 8
0.94
72
7.33
A Mutual Information Maximization Perspective Of Language Representation Learning
6, 6, 8
6, 8, 8
0.94
73
7.33
On The Equivalence Between Node Embeddings And Structural Graph Representations
6, 8, 6
6, 8, 8
0.94
74
7.33
Lambdanet: Probabilistic Type Inference Using Graph Neural Networks
3, 8, 6
6, 8, 8
0.94
75
7.33
Graphzoom: A Multi-level Spectral Approach For Accurate And Scalable Graph Embedding
3, 3
8, 8, 6
0.94
76
7.33
Intensity-free Learning Of Temporal Point Processes
8, 6, 8
8, 6, 8
0.94
77
7.33
Neural Network Branching For Neural Network Verification
6, 3, 8
8, 6, 8
0.94
78
7.33
Energy-based Models For Atomic-resolution Protein Conformations
6, 8, 8
6, 8, 8
0.94
79
7.33
Progressive Learning And Disentanglement Of Hierarchical Representations
1, 6, 8
6, 8, 8
0.94
80
7.33
Adversarial Training And Provable Defenses: Bridging The Gap
8, 6, 8
8, 6, 8
0.94
81
7.33
Meta-q-learning
3, 6, 3
8, 8, 6
0.94
82
7.33
Harnessing The Power Of Infinitely Wide Deep Nets On Small-data Tasks
8, 6, 8
8, 6, 8
0.94
83
7.33
The Ingredients Of Real World Robotic Reinforcement Learning
6, 8, 6
6, 8, 8
0.94
84
7.33
Online And Stochastic Optimization Beyond Lipschitz Continuity: A Riemannian Approach
8, 8, 6
8, 8, 6
0.94
85
7.33
Electra: Pre-training Text Encoders As Discriminators Rather Than Generators
6, 8, 3
8, 8, 6
0.94
86
7.33
Watch The Unobserved: A Simple Approach To Parallelizing Monte Carlo Tree Search
6, 3, 8
8, 6, 8
0.94
87
7.33
Generalization Of Two-layer Neural Networks: An Asymptotic Viewpoint
8, 6, 8
8, 6, 8
0.94
88
7.33
Convolutional Conditional Neural Processes
6, 8, 8
6, 8, 8
0.94
89
7.33
Fast Task Inference With Variational Intrinsic Successor Features
8, 3, 8
8, 6, 8
0.94
90
7.33
Fspool: Learning Set Representations With Featurewise Sort Pooling
6, 8, 3
8, 8, 6
0.94
91
7.33
Seed Rl: Scalable And Efficient Deep-rl With Accelerated Central Inference
8, 6, 8
8, 6, 8
0.94
92
7.33
Meta-learning Acquisition Functions For Transfer Learning In Bayesian Optimization
8, 3, 3
8, 6, 8
0.94
93
7.33
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
8, 8, 6
8, 8, 6
0.94
94
7.33
Doubly Robust Bias Reduction In Infinite Horizon Off-policy Estimation
6, 8, 6
6, 8, 8
0.94
95
7.33
Meta-learning Without Memorization
6, 3, 8
8, 6, 8
0.94
96
7.33
Thieves On Sesame Street! Model Extraction Of Bert-based Apis
6, 8, 8
6, 8, 8
0.94
97
7.33
Physics-aware Difference Graph Networks For Sparsely-observed Dynamics
8, 8, 6
8, 8, 6
0.94
98
7.33
Graph Neural Networks Exponentially Lose Expressive Power For Node Classification
8, 6, 8
8, 6, 8
0.94
99
7.33
Reformer: The Efficient Transformer
8, 6, 6
8, 8, 6
0.94
100
7.33
Sequential Latent Knowledge Selection For Knowledge-grounded Dialogue
8, 6, 6
8, 8, 6
0.94
101
7.33
When Do Variational Autoencoders Know What They Don't Know?
6, 8, 8
6, 8, 8
0.94
102
7.33
Learning To Plan In High Dimensions Via Neural Exploration-exploitation Trees
8, 6, 6
8, 8, 6
0.94
103
7.33
Program Guided Agent
6, 1, 6
8, 6, 8
0.94
104
7.33
Symplectic Recurrent Neural Networks
8, 6, 3
8, 8, 6
0.94
105
7.33
Observational Overfitting In Reinforcement Learning
3, 8, 8
6, 8, 8
0.94
106
7.33
Cyclical Stochastic Gradient Mcmc For Bayesian Deep Learning
6, 6, 8
6, 8, 8
0.94
107
7.33
Discriminative Particle Filter Reinforcement Learning For Complex Partial Observations
8, 6, 6
8, 6, 8
0.94
108
7.33
High Fidelity Speech Synthesis With Adversarial Networks
8, 6, 8
8, 6, 8
0.94
109
7.33
Assemblenet: Searching For Multi-stream Neural Connectivity In Video Architectures
6, 8, 8
6, 8, 8
0.94
110
7.33
Harnessing Structures For Value-based Planning And Reinforcement Learning
6, 8, 8
6, 8, 8
0.94
111
7.33
Learning Hierarchical Discrete Linguistic Units From Visually-grounded Speech
6, 8, 8
6, 8, 8
0.94
112
7.33
Cross-lingual Alignment Vs Joint Training: A Comparative Study And A Simple Unified Framework
6, 8, 8
6, 8, 8
0.94
113
7.33
Explain Your Move: Understanding Agent Actions Using Focused Feature Saliency
3, 8, 8
6, 8, 8
0.94
114
7.33
Glad: Learning Sparse Graph Recovery
6, 6
8, 6, 8
0.94
115
7.33
Disagreement-regularized Imitation Learning
6, 8, 8
6, 8, 8
0.94
116
7.33
Is A Good Representation Sufficient For Sample Efficient Reinforcement Learning?
8, 8, 3
8, 8, 6
0.94
117
7.33
End To End Trainable Active Contours Via Differentiable Rendering
3, 8, 1
8, 8, 6
0.94
118
7.33
Stable Rank Normalization For Improved Generalization In Neural Networks And Gans
3, 8, 8
6, 8, 8
0.94
119
7.33
What Can Neural Networks Reason About?
6, 6, 8
8, 6, 8
0.94
120
7.33
Unbiased Contrastive Divergence Algorithm For Training Energy-based Latent Variable Models
3, 8, 8
6, 8, 8
0.94
121
7.33
Gradient Descent Maximizes The Margin Of Homogeneous Neural Networks
8, 8, 6
8, 8, 6
0.94
122
7.33
Disentangling Neural Mechanisms For Perceptual Grouping
6, 8, 8
6, 8, 8
0.94
123
7.33
Poly-encoders: Architectures And Pre-training Strategies For Fast And Accurate Multi-sentence Scoring
6, 6, 8
8, 6, 8
0.94
124
7.00
Encoding Word Order In Complex Embeddings
8, 3, 6, 6
8, 6, 8, 6
1.00
125
7.00
And The Bit Goes Down: Revisiting The Quantization Of Neural Networks
8, 6, 8, 6
8, 6, 8, 6
1.00
126
7.00
Neural Tangent Kernels, Transportation Mappings, And Universal Approximation
8, 3
8, 6
1.00
127
7.00
Biologically Inspired Sleep Algorithm For Increased Generalization And Adversarial Robustness In Deep Neural Networks
6, 8
6, 8
1.00
128
7.00
Language Gans Falling Short
3, 8
6, 8
1.00
12