XXXVIII SIMPÓSIO BRASILEIRO DE TELECOMUNICAÇÕES E PROCESSAMENTO DE SINAIS - SBrT 2020, 22–25 DE NOVEMBRO DE 2020, FLORIANÓPOLIS, SC Mitigation of nonlinear phase noise in coherent 16-QAM long-reach PONs by K-nearest neighbors-based classification Rômulo de Paula, Lúcio Neri Borges, Marcelo Luis Francisco Abbade, and Ivan Aldaya Abstract— Nonlinear phase-noise induced by the Kerr effect is the main nonlinear impairment in single-channel coherent long-reach passive optical networks (LR-PONs). In this work, we explore the capability of the K-nearest neighbors (KNN) algorithm to mitigate this impairment in links with non-negligible fiber dispersion. Simulation results show that when employing KNN in a 56-Gbps coherent LR-PON with a 100-km range and 1:64 splitting ratio, the effective Q-factor is improved by 0.15 dB with respect to maximum likelihood. This increment is achieved by setting the parameter K to 13, which leads to a minimum training data set size of 500 symbols. Keywords— Coherent optical communications; Passive optical networks; Nonlinear phase noise; Machine learning; K-nearest neighbors. I. I NTRODUCTION Due to their high capacity-distance product, optical fiber links have been adopted not only for long-haul but also for lower-range applications [1]. For decades, intensity modula- tion with direct detection (IM/DD) systems could meet the in- creasing throughput requirements but, with the popularization of multimedia applications and migration to cloud services, these systems are becoming obsolete [2]. In this context, the introduction of coherent receivers employing digital signal processors (DSPs) gave birth to the fifth generation lightwave systems. These systems allowed to retain not only phase but also polarization information, thus enabling the utilization of advanced modulation formats with unprecedented spectral efficiency [3]. In addition, the adoption of high performance forward error correction (FEC) codes enabled the increase of the number of points in the constellation, leading to M-ary phase shift keying (PSK), quadrature phase shift keying, and quadrature amplitude modulation (QAM). In digital coherent systems, linear impairments such as chro- matic dispersion (CD), polarization mode dispersion (PMD), and linear phase noise can be compensated using well- established DSP algorithms [4] [5]. On the other hand, the reduction of Kerr-induced nonlinear distortion remains as an open problem. In this work we focus on single channel systems with unrepeated links, i.e. long reach passive optical networks (LR-PONs), where no cross-phase modulation (XPM) or four wave mixing (FWM) are present and, therefore, self-phase Campus São João da Boa Vista, Universidade Estadual Paulista "Júlio de Mesquita Filho" - Unesp, São João da Boa Vista - SP, e-mails: romulo.junior314@yahoo.com.br, lucio.borges@unesp.br, marcelo.abbade@unesp.br, ivan.aldaya@unesp.br. modulation (SPM) is the dominant nonlinear distortion mech- anism [1]. Besides the nonlinear distortion, the absence of mid-span amplifiers and the high transmission loss, makes the noise produced by the photodetector to result in a low signal-to-noise ratio (SNR). Additionally, even if accumulated CD is compensated at the receiver, significant intersymbol interference (ISI) is present along the fiber. The overlap of adjacent symbols then results on a stochastic-like behaviour of the SPM, as it depends on the local intensity [6]. Therefore, the mitigation of the harmful distortion caused by the complex interplay among the SPM and ISI that results in nonlinear phase noise (NLPN), in combination with the receiver noise, becomes challenging, specially in multi-level modulation for- mats where higher amplitude symbols are more affected. Some methods for nonlinear impairments has already been proposed, both in the optical and electrical domains. In the op- tical domain, two of the methods that attracted more attention are conjugated twin-waves [7] and mid-span conjugation [8]. These methods, however, suffer from either low flexibility or reduced capacity. In the electrical domain, on the other hand, the flexibility of digital electronics enables adaptive non- linear compensation. The traditional electronic approach for nonlinear impairment compensation relay on model inversion, for instance, digital backward propagation (DBP) [9], inverse Volterra-series transfer function (IVSTF) [10] and Wiener Hammerstein (WH) [11], [12]. One of the advantages of these methods is that they are modulation format agnostic, thus showing flexibility in systems where adaptive modulation is used. Unfortunately, the elevated computational cost of model inversion prevents their adoption in real-time applications. In this context, machine learning has emerged as a feasible lower complexity alternative with high potential for implementation in future nonlinear mitigation schemes. Machine learning algorithms can be roughly divided into supervised and unsupervised [13]. Unsupervised algorithms include clustering, as in [14] and [15], in which constellation symbols are classified utilizing histogram based clustering and expectation maximization, respectively. On the other hand, supervised algorithms require a training set in which both the data and their labels are previously known by the re- ceiver. Supervised algorithms can perform either regression or classification, depending whether the output is discrete or continuous. In [16], an artificial neural network (ANN) is used as an equalizer to compensate SPM in optical links, whereas in [17] and [18], support vector machines (SVM) and K-nearest neighbors (KNN) algorithms are proposed for