Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D3.txt

Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.

Doc_A has 36.3636% similarity with Doc_B.
Doc_B has 36.3636% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D4.txt

Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.

Doc_A has 36.3636% similarity with Doc_B.
Doc_B has 21.0526% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D5.txt

Doc_A: [0-50] Deeper neural networks are more difficult to train.
Doc_B: [0-50] Deeper neural networks are more difficult to train.
Doc_A: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_B: [52-182] We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_B: [332-492] We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Doc_A: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_B: [494-642] On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_B: [726-794] This result won the 1st place on the ILSVRC 2015 classification task.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [859-946] The depth of representations is of central importance for many visual recognition tasks.

Doc_A has 81.8182% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D6.txt

Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.

Doc_A has 36.3636% similarity with Doc_B.
Doc_B has 44.4444% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D7.txt

Doc_A: [184-330] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_B: [180-326] We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.
Doc_A: [644-724] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_B: [649-729] An ensemble of these residual nets achieves 3.57% error on the ImageNet test set.
Doc_A: [796-857] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_B: [803-864] We also present analysis on CIFAR-10 with 100 and 1000 layers.
Doc_A: [859-946] The depth of representations is of central importance for many visual recognition tasks.
Doc_B: [866-953] The depth of representations is of central importance for many visual recognition tasks.

Doc_A has 36.3636% similarity with Doc_B.
Doc_B has 21.0526% similarity with Doc_A.


Doc_A: ./data/paper_1.txt  Doc_B: ./data/paper_1_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D3.txt

Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_A: [692-906] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_B: [707-921] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_A: [908-1034] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_B: [923-1049] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_A: [1036-1108] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Doc_B: [1051-1123] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Doc_A has 75% similarity with Doc_B.
Doc_B has 75% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D4.txt

Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_A: [692-906] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_B: [707-921] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_A: [908-1034] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_B: [923-1049] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_A: [1036-1108] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Doc_B: [1051-1123] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Doc_A has 75% similarity with Doc_B.
Doc_B has 37.5% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D5.txt

Doc_A: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_B: [0-161] We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.
Doc_A: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_B: [163-403] The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D6.txt

Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.

Doc_A has 37.5% similarity with Doc_B.
Doc_B has 60% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D7.txt

Doc_A: [405-518] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_B: [420-533] The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients.
Doc_A: [520-607] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_B: [535-622] The hyper-parameters have intuitive interpretations and typically require little tuning.
Doc_A: [609-690] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_B: [624-705] Some connections to related algorithms, on which Adam was inspired, are discussed.
Doc_A: [692-906] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_B: [707-921] We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Doc_A: [908-1034] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_B: [923-1049] Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods.
Doc_A: [1036-1108] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Doc_B: [1051-1123] Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

Doc_A has 75% similarity with Doc_B.
Doc_B has 37.5% similarity with Doc_A.


Doc_A: ./data/paper_2.txt  Doc_B: ./data/paper_2_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D3.txt

Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [829-1005] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.
Doc_B: [835-1011] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.

Doc_A has 33.3333% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D4.txt

Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [829-1005] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.
Doc_B: [835-1011] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.

Doc_A has 33.3333% similarity with Doc_B.
Doc_B has 14.2857% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D5.txt

Doc_A: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_B: [0-171] We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes.
Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_B: [315-539] The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Doc_A: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.
Doc_B: [541-665] To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation.

Doc_A has 66.6667% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D6.txt

Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.

Doc_A has 16.6667% similarity with Doc_B.
Doc_B has 25% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D7.txt

Doc_A: [173-313] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_B: [172-312] On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.
Doc_A: [829-1005] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.
Doc_B: [835-1011] We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best enty.

Doc_A has 33.3333% similarity with Doc_B.
Doc_B has 14.2857% similarity with Doc_A.


Doc_A: ./data/paper_3.txt  Doc_B: ./data/paper_3_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D3.txt

Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.
Doc_A: [739-897] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_B: [745-903] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_A: [899-962] Internal estimates are also used to measure variable importance.
Doc_B: [905-968] Internal estimates are also used to measure variable importance.

Doc_A has 50% similarity with Doc_B.
Doc_B has 55.5556% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D4.txt

Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.
Doc_A: [739-897] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_B: [745-903] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_A: [899-962] Internal estimates are also used to measure variable importance.
Doc_B: [905-968] Internal estimates are also used to measure variable importance.

Doc_A has 50% similarity with Doc_B.
Doc_B has 29.4118% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D5.txt

Doc_A: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_B: [0-193] Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.
Doc_A: [195-245] The generalization error for forests converges a.s.
Doc_B: [195-245] The generalization error for forests converges a.s.
Doc_A: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_B: [247-308] to a limit as the number of trees in the forest becomes large.
Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [577-587] Freund & R.
Doc_A: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.
Doc_B: [589-737] Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise.

Doc_A has 70% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D6.txt

Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.

Doc_A has 30% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D7.txt

Doc_A: [310-461] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_B: [311-462] The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them.
Doc_A: [463-575] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_B: [464-576] Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y.
Doc_A: [577-587] Freund & R.
Doc_B: [578-588] Freund & R.
Doc_A: [739-897] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_B: [745-903] Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting.
Doc_A: [899-962] Internal estimates are also used to measure variable importance.
Doc_B: [905-968] Internal estimates are also used to measure variable importance.

Doc_A has 50% similarity with Doc_B.
Doc_B has 29.4118% similarity with Doc_A.


Doc_A: ./data/paper_4.txt  Doc_B: ./data/paper_4_D8.txt

Doc_A: [577-587] Freund & R.
Doc_B: [582-592] Freund & R.

Doc_A has 10% similarity with Doc_B.
Doc_B has 11.1111% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D3.txt

Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_B: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.

Doc_A has 40% similarity with Doc_B.
Doc_B has 40% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D4.txt

Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_B: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.

Doc_A has 40% similarity with Doc_B.
Doc_B has 15.3846% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D5.txt

Doc_A: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_B: [0-134] In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.
Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Doc_B: [416-602] These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.

Doc_A has 60% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D6.txt

Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Doc_A has 20% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D7.txt

Doc_A: [136-414] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_B: [140-418] Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Doc_A: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.
Doc_B: [604-720] We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results.

Doc_A has 40% similarity with Doc_B.
Doc_B has 15.3846% similarity with Doc_A.


Doc_A: ./data/paper_5.txt  Doc_B: ./data/paper_5_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D3.txt

Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_A: [274-353] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_B: [258-337] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_A: [355-496] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_B: [339-480] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.

Doc_A has 60% similarity with Doc_B.
Doc_B has 60% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D4.txt

Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_A: [274-353] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_B: [258-337] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_A: [355-496] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_B: [339-480] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.

Doc_A has 60% similarity with Doc_B.
Doc_B has 23.0769% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D5.txt

Doc_A: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_B: [0-158] Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.

Doc_A has 40% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D6.txt

Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.

Doc_A has 20% similarity with Doc_B.
Doc_B has 50% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D7.txt

Doc_A: [160-272] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_B: [144-256] This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language.
Doc_A: [274-353] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_B: [258-337] Emphasis is put on ease of use, performance, documentation, and API consistency.
Doc_A: [355-496] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.
Doc_B: [339-480] It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings.

Doc_A has 60% similarity with Doc_B.
Doc_B has 23.0769% similarity with Doc_A.


Doc_A: ./data/paper_6.txt  Doc_B: ./data/paper_6_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D3.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D4.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D5.txt

Doc_A: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_B: [0-141] The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and decoder.
Doc_A: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.
Doc_B: [143-237] The best performing models also connect the encoder and decoder through an attention mechanism.

Doc_A has 50% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D6.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D7.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_7.txt  Doc_B: ./data/paper_7_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D3.txt

Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_A: [545-602] We here extend this result to non-separable training data.
Doc_B: [549-606] We here extend this result to non-separable training data.
Doc_A: [604-717] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_B: [608-721] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_A: [719-894] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Doc_B: [723-898] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 62.5% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D4.txt

Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_A: [545-602] We here extend this result to non-separable training data.
Doc_B: [549-606] We here extend this result to non-separable training data.
Doc_A: [604-717] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_B: [608-721] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_A: [719-894] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Doc_B: [723-898] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 31.25% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D5.txt

Doc_A: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_B: [0-90] The support-vector network is a new learning machine for two-group classification problems.
Doc_A: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_B: [92-224] The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space.
Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [226-288] In this feature space a linear decision surface is constructed.
Doc_A: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_B: [290-392] Special properties of the decision surface ensures high generalization ability of the learning machine.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D6.txt

Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.

Doc_A has 25% similarity with Doc_B.
Doc_B has 40% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D7.txt

Doc_A: [226-288] In this feature space a linear decision surface is constructed.
Doc_B: [232-294] In this feature space a linear decision surface is constructed.
Doc_A: [394-543] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_B: [398-547] The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors.
Doc_A: [545-602] We here extend this result to non-separable training data.
Doc_B: [549-606] We here extend this result to non-separable training data.
Doc_A: [604-717] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_B: [608-721] High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated.
Doc_A: [719-894] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Doc_B: [723-898] We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 31.25% similarity with Doc_A.


Doc_A: ./data/paper_8.txt  Doc_B: ./data/paper_8_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D3.txt

Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_B: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_A: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Doc_B: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Doc_A has 71.4286% similarity with Doc_B.
Doc_B has 71.4286% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D4.txt

Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_B: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_A: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Doc_B: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Doc_A has 71.4286% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D5.txt

Doc_A: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_B: [0-306] We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [391-446] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.
Doc_B: [599-717] In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation.

Doc_A has 71.4286% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D6.txt

Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.

Doc_A has 42.8571% similarity with Doc_B.
Doc_B has 60% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D7.txt

Doc_A: [308-389] The training procedure for G is to maximize the probability of D making a mistake.
Doc_B: [309-390] The training procedure for G is to maximize the probability of D making a mistake.
Doc_A: [391-446] This framework corresponds to a minimax two-player game.
Doc_B: [392-447] This framework corresponds to a minimax two-player game.
Doc_A: [448-597] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_B: [449-598] In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere.
Doc_A: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_B: [719-848] There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples.
Doc_A: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Doc_B: [850-977] Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

Doc_A has 71.4286% similarity with Doc_B.
Doc_B has 33.3333% similarity with Doc_A.


Doc_A: ./data/paper_9.txt  Doc_B: ./data/paper_9_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D3.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_A: [1040-1103] Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Doc_B: [1044-1107] Code is available at https://github.com/ShaoqingRen/faster_rcnn.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 62.5% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D4.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_A: [1040-1103] Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Doc_B: [1044-1107] Code is available at https://github.com/ShaoqingRen/faster_rcnn.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 31.25% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D5.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_B: [113-260] Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_B: [567-679] RPNs are trained end-to-end to generate highquality region proposals, which are used by Fast R-CNN for detection.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.

Doc_A has 75% similarity with Doc_B.
Doc_B has 100% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D6.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.

Doc_A has 50% similarity with Doc_B.
Doc_B has 66.6667% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D7.txt

Doc_A: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_B: [0-111] State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations.
Doc_A: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_B: [262-442] In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Doc_A: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_B: [444-565] An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position.
Doc_A: [681-786] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_B: [686-791] With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features.
Doc_A: [1040-1103] Code is available at https://github.com/ShaoqingRen/faster_rcnn.
Doc_B: [1044-1107] Code is available at https://github.com/ShaoqingRen/faster_rcnn.

Doc_A has 62.5% similarity with Doc_B.
Doc_B has 31.25% similarity with Doc_A.


Doc_A: ./data/paper_10.txt  Doc_B: ./data/paper_10_D8.txt


Doc_A has 0% similarity with Doc_B.
Doc_B has 0% similarity with Doc_A.


