• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Applications and potentials of machine learning in optoelectronic materials research: An overview and perspectives

    2023-12-15 11:47:36ChengZhouZhang張城洲andXiaoQianFu付小倩
    Chinese Physics B 2023年12期

    Cheng-Zhou Zhang(張城洲) and Xiao-Qian Fu(付小倩),2,?

    1School of Information Science and Engineering,University of Jinan,Jinan 250022,China

    2Shandong Provincial Key Laboratory of Network based Intelligent Computing,University of Jinan,Jinan 250022,China

    Keywords: optoelectronic materials,devices,machine learning,prior knowledge

    1.Introduction

    Optoelectronic materials have a wide range of applications in various fields of modern technology, including fiber optic communication,UV detectors and LEDs,power devices,solar cells,optical sensors,and lasers.[1-8]These materials exhibit excellent performance,and with continuous research and development,they keep offering more efficient and reliable solutions.

    The research and development of traditional optoelectronic materials require a lot of trial, error, and improvements.[9]Meanwhile, computational methods such as density functional theory (DFT),[10]molecular dynamics simulation,[11]and Monte Carlo simulation[12]have been widely used in the prediction of material properties and structures.[13-15]The results of theoretical and computational simulations can better guide experiments.For example,Monte Carlo simulations study carrier transport[16]or finite element methods to simulate optical properties.[17]However,although experimental and theoretical research has produced massive databases, their capabilities are still limited.The methodology is time-consuming and costly,not only for optoelectronic materials but for all other materials.

    In 2011, the materials genome initiative (MGI) was launched[18]to accelerate the optimization and application of new materials by integrating experimental, theoretical, and computational approaches.[19,20]High-throughput computing is a vital tool and method that provides computational support for the MGI.[21-23]Currently, materials informatics,[24]as an interdisciplinary discipline covering the fields of materials simulation and computer, is intertwined with the MGI to provide more opportunities for materials research.[25]Figure 1 shows the stages of development of materials science,and the integration of knowledge and technology between the locations has contributed to the progress of materials science.The data-driven stage has evolved towards fast, efficient, and sustainable.

    Machine learning (ML) empowers computers to learn from experience and continually enhance their performance across various tasks, broadening the scope of computational applications and improving efficiency.Leveraging ML can significantly reduce the number and costs of traditional trialand-error experiments in optoelectronic materials.By analyzing large amounts of data and patterns, ML can predict the properties and behaviors of materials.This approach significantly reduces the required experiments and expedites the development process.

    According to the data from Web of Science, there are a total of 222 papers when searching for the topic of machine learning and optoelectronic materials, with 197 published in the past five years, and the progress is astonishing.These researches apply ML to material design, process optimization,properties prediction,structure analysis,and more.For example, support vector machine (SVM),random forest(RF),and other ensemble models have been important tools in electronic structure properties prediction.[26-29]Deep learning(DL)has been widely used in optoelectronic materials and often necessitates substantial volumes of data.Convolutional neural network(CNN),recurrent neural network(RNN),generative adversarial network (GAN), and graph neural network (GNN)are mainly applied to properties prediction,[30-32]structure prediction,[33-35]image analysis,[36]and optimization.[37]Nevertheless, numerous material or device datasets in practice fall short of achieving such scale.Transfer learning(TL)emerges as a valuable technique,enabling DL models to be effectively trained even when confronted with limited data.[30,31]With the development of ML in materials,advanced ML algorithms have gradually been applied to optoelectronics,providing more intelligent, efficient, and adaptive methods for the research and application of optoelectronic materials.Active learning is used in applications such as property prediction,[38]discovering new materials,[39]and capturing atomic force fields.[40]Reinforcement learning[41]improves the efficiency of material optimization.[42]ML interatomic potentials (MLIAPs)and force fields(MLFFs)perform nanosecond molecular dynamics simulations.[43,44]

    ML enhances research efficiency, improves material properties, and advances optoelectronic materials through these applications.However, choosing proper ML models to achieve desired characteristics is still challenging when relating entirely new material.Review articles could supply some methodologies for beginners.Unfortunately, we notice that most of these articles focus more on a specific material,such as new 2D materials,[45]perovskite materials,[46-48]nanoplasmonics,[49]etc.Seldom covers a wide range of optoelectronic materials and devices.

    In this paper,we initially provide a brief development of ML and an overview of the typical workflows in optoelectronic materials ML.Subsequently,we delve into the methodologies and techniques employed to apply ML in predicting and optimizing optoelectronic material structures and device properties.These applications underscore how ML is wellpositioned to expedite research, enhance performance, and drive the development of optoelectronic materials.Finally,we discuss the challenges and prospects within this domain,offering insights and recommendations for future research directions and trends.

    Fig.1.Advances in materials science.

    2.Fundamentals of ML

    Research problems in optoelectronic materials are diverse, and different ML algorithms and techniques apply to different problems.Understanding the development of ML can help researchers select appropriate methods to solve specific optoelectronic materials problems.Understanding the general process of ML can help researchers plan better, execute,and improve applications in optoelectronic materials and increase the efficiency and quality of research.This section is organized around two subsections,and we focus on the general process of ML of optoelectronic materials.

    2.1.The development of ML

    In 1956, John McCarthyet al.introduced the concept of artificial intelligence (AI) at the Dartmouth Conference,and various theories and algorithms emerged after decades of development.[50-52]ML, as a branch of AI, has developed from simple to complex and can be summarised in the following stages.

    In the early stages of ML, algorithms relied heavily on manual feature engineering,requiring the manual extraction of relevant features as model inputs.Typical algorithms included linear regression and SVM.While these algorithms performed well in many tasks, the reliance on manual feature engineering limited their applicability.With DL,multilayer neural networks can learn hierarchical representations directly from the raw data, thus eliminating the need for explicit feature engineering.CNN and RNN have demonstrated strong capabilities in feature learning and have made significant progress in natural language processing (NLP).[53]Reinforcement learning focuses on intelligent autonomous decision-making to maximize gains and is suitable for online learning and optimization in control tasks.Current multimodal learning integrates information from different sources, while TL enables knowledge transfer across domains.These models allow ML to solve complex real-world problems.[54]

    ML has experienced a leap from weakness to strength,from simple statistical analyses to autonomous learning functions.Its application domains and boundaries expand, providing novel tools and methods for solving multifaceted challenges.

    ML opens up new opportunities in various subfields of materials science.Butleret al.[55]presented advances in ML in chemical sciences regarding material design, material synthesis, and theoretical guidance.Piccinottiet al.[56]summarized AI’s research progress in photonics, nanophotonics,and plasmonics.Schlederet al.[57]outlined using DFT,highthroughput computing,and ML,modern methods of computing materials, to design novel materials with enhanced properties.Choudharyet al.[58]outlined advances in DL methods for atomistic simulations,materials imaging,and spectral analysis.In addition,Golmohammadi and Aryanpour[59]collected and organized the articles containing keywords such as ML and materials discovery from 2011-2022 and predicted the properties of more than thirty materials using ML and proposed four guidelines to help researchers develop better models.Kadilkaret al.[60]discussed the progress of using ML to design materials with target properties and how ML can reduce the design space’s dimensionality.Liuet al.[61]reviewed advances in research for semiconductor materials and manufacturing,where ML has become a powerful tool for semiconductor research, helping to analyze large-scale datasets and solve complex problems in semiconductor materials science and manufacturing.As an emerging strategic material,applying ML to assist the research and development of optoelectronic materials is an important direction of current research.

    2.2.ML process for optoelectronic materials

    The general ML process for optoelectronic materials is shown in Fig.2.It typically includes data collection and preprocessing,feature engineering,model selection and training,and model evaluation and optimization.Below, we detail the application of these steps to research in optoelectronics.

    Fig.2.The general ML process for optoelectronic materials.

    2.2.1.Data collection and preprocessing

    Data collection and preprocessing are the first steps in ML of optoelectronic materials.Constructing high-quality datasets requires acquiring data from multiple sources,including data obtained from experimental measurements, computational simulation results, and materials reported in the paper.Researchers can collect primary data, such as components and structures of actually synthesized optoelectronic material samples,and property data,such as conductivity and optical properties obtained from measurements from COD,[62]and ICSD[63]databases.Computational material databases include AFLOW,[64]MP,[65]NOMAD,[66]and CMR.[67]Contain property data of optoelectronic materials that have been predicted using first-principle calculations and other methods.Data related to optoelectronic materials can also be extracted from the published literature.

    In addition to numerical data, it is possible to create datasets of topographic images of optoelectronic materials obtained by instruments such as scanning electron microscopy(SEM) and atomic force microscopy and datasets of characteristic spectra of optoelectronic materials obtained by testing with devices such as UV-vis spectrometers.

    By integrating and processing heterogeneous data from multiple sources,a comprehensive and high-quality optoelectronic material dataset can be constructed for training ML models,material property prediction,and new material discovery.These rich and influential data will significantly improve the learning ability and generalization of the models.

    ML in optoelectronic materials science is highly dependent on standardized, high-quality data, and many open initiatives that provide standardized materials data following the FAIR (findable, accessible, interoperable, reusable)principle[68]are of great significance.Table 1 lists some vital standardized initiatives that allow different researchers to compare and reproduce on a uniform database, accelerating research progress.Statistical techniques and data mining can be used to normalize or discretize the raw data to obtain highquality data.For example, missing data imputation[69]can be used to improve the completeness of datasets; sampling strategy[70]can be used to achieve model performance benchmarks with fewer data;and data augmentation[71]can be used to reduce the reliance on more labeled data.Through these standardization and quality control measures,we can provide high-quality and uniform data sources for ML,reduce the impact of data noise on the model,and promote the development of research in optoelectronic materials.

    Table 1.Standardization initiatives.

    2.2.2.Feature engineering

    Feature engineering is an integral part of data science and ML and can significantly impact the performance and effectiveness of models.Feature engineering mainly includes feature extraction,construction,and selection.

    Feature extraction is a critical step in extracting useful information from raw data.This process helps reducing the feature matrix’s dimensionality, reducing computational complexity and shortening training time.Among the dimensionality reduction methods,principal component analysis[72](PCA) and linear discriminant analysis (LDA)[73]are commonly used techniques.Potapov and Lubk[74]applied PCA to denoise spectral images.They believed the data needed to be smoothed, filtered, and weighted.Otherwise, the denoising effect of PCA will be reduced, and human error will be introduced.

    Feature construction is constructing new features from material data by combining and transforming them.The optoelectronic material features are constructed as shown in Fig.3 and include in the following aspects.

    (i)Basic attribute featuresUse the basic attributes of the optoelectronic material,such as components,crystal structure parameters,energy gap,electronic density of states,etc.,as the initial features.

    (ii)Structural descriptorsConstruct various types of descriptors describing the component and structural information of the material, such as atomic symmetry functions, crystal fingerprints,etc.,to capture the topological structural features of the material.

    (iii)Graph network featuresUse GNN or graph convolutional network(GCN)to extract the connectivity network features of the material to describe the atomic arrangement and crystal symmetry.

    (iv)Spectral featuresExtract features from experimental or computational spectra to construct spectral fingerprints and other features to describe the optical properties of materials.

    (v)Image featuresUse computer vision technology to process SEM and transmission electron microscope (TEM)images of materials to extract image features that characterize the size,shape,distribution,and other information.

    (vi)Multi-source feature fusionSynthesize the features obtained from the above channels to construct a comprehensive material descriptor.

    Fig.3.Feature construction methods for optoelectronic material.

    Feature selection plays a crucial role in data analysis and ML, which can improve the performance and interpretability of models.Wanget al.[75]introduced standard feature selection methods for perovskite materials.Whether used individually or in combination,the Pearson correlation coefficient in the filtering method,recursive feature elimination in the wrapping method, and tree model in the embedding method appeared more frequently.Choosing appropriate feature selection methods can significantly reduce model complexity and improve model interpretability.Through practical feature engineering, data redundancy can be eliminated, the robustness of model prediction can be improved, and the model’s learning of fundamental physical mechanisms can be more explicit.Table 2 lists some of the commonly used features.

    If the dataset is small,the complexity of the data distribution is low.In that case, too many features make the model complex overcapacity, resulting in overfitting.[76]When the number of features is close to or exceeds the number of samples,the model can simply memorize each training sample and lose the ability to generalize.

    Randomly omitting half of the feature detectors[82]can prevent the problem.The regularization technique prevents neural network overfitting and improves the model’s generalization ability.Its basic idea is to randomly drop some of the hidden layer nodes with a certain probability during the training process to avoid these nodes from over-dependence on the features of other nodes.In testing,all nodes participate in the prediction, but the output is scaled to compensate for the effect of ignoring nodes.It regularizes the network by randomly ignoring nodes during training, encouraging nodes to learn features that do not depend on each other, thus improving generalization.

    It is also possible to transform the regression problem into a classification problem by simply determining whether the bandgap is within a specific range without predicting the exact value of the bandgap.[83]To better adapt to the needs of practical applications, we convert continuous bandgap values into binary labels and simplify the task of judging whether the material is usable instead of accurately predicting the bandgap values.This method reduces the difficulty of the prediction task and focuses more on determining the availability of the material.

    TL can utilize the information of rich data in the source domain, which is suitable for the case of relatively few samples in the destination domain.Some material representation methods,such as GNN and crystal fingerprints,can be applied to various optoelectronic materials to realize TL.These methods are widely used in materials science and other fields to help understand and interpret complex features.They can reduce the difficulty of the target task and make it easier for ML models to capture essential patterns,thus improving the accuracy and interpretability of predictions.

    2.2.3.Model selection and training

    Model selection and training is an iterative process that needs to be adjusted and optimized according to the actual situation.The experience of the experts, as well as the types of models and methods used in previous studies,are factors to be considered.

    ML algorithms are classified into supervised, unsupervised, weakly supervised, and reinforcement learning.The ML models built based on ML algorithms are shown in Fig.4.Many papers describe the process of ML algorithms.[84-87]Depending on the nature and characteristics of the problem,we need to choose the appropriate model architecture.

    Supervised learning relies on explicit labeling information,unsupervised learning does not use labeling information,and weakly supervised learning uses partial,incomplete,or inaccurate labeling information.Reinforcement learning learns decision-making strategies by interacting with the environment.The core idea of TL is to improve performance on a new task by utilizing existing knowledge rather than learning from scratch.

    Classification algorithms include support vector classification (SVC), logistic regression, decision tree (DT), RF,k-nearest neighbor (KNN), etc., and common regression algorithms include linear regression, support vector regression(SVR), gradient boosting regression (GBR), etc.In artificial neural network (ANN), the nature of tasks, whether classification or regression,is typically determined by the network’s output layer.Softmax and sigmoid output layers are employed for classification tasks, while linear and single-neuron output layers are used for regression tasks.Furthermore,choosing the neural network’s loss function determines whether the network performs classification or regression.For instance,classification tasks typically utilize cross-entropy loss,whereas regression tasks utilize mean squared error loss.

    2.2.4.Model evaluation and optimization

    Objectively evaluating model performance is necessary to improve the model’s generalization ability and better adapt to unknown data.Researchers usually use cross-validation methods to measure model performance.[88]Error assessment in classification tasks can use metrics such as accuracy, F1-score, AUC-ROC curve, etc.; regression tasks can use mean absolute error,root mean square error(RMSE),R2score,etc.Different evaluation metrics can measure the performance of the model from different perspectives.By adjusting hyperparameters,regularization techniques,and adopting a stacked ML model,[89]the model’s performance can be effectively improved,and more accurate ML models can be constructed.

    This subsection summarizes the general ML process.Table 3 lists the optoelectronic material ML tools.These tools can fully support and accelerate the development of optoelectronic materials.Researchers must establish a solid mathematical foundation,deeply understand ML algorithms and principles, and utilize these tools for rigorous experimental design and validation to avoid misleading results.

    Fig.4.Common ML models.

    Table 3.Resources for materials ML.

    3.ML applications in optoelectronic materials and devices

    This section will introduce the application of ML in optoelectronic materials and devices.The basic information of related literature is summarized in Table 4, including the research content, best ML algorithms (some of the literature used only oneapproach), feature selection, and main contributions.Choosing an appropriate model needs to consider the specifics of the task, and prior knowledge can help researchers identify more effective methods for dealing with specific problems.

    Table 4.Literature analysis.

    Table 4.Continued.

    3.1.Crystal structure

    Studying crystal structures provides essential information for deeply understanding materials’ fundamental properties,structural characteristics, and behavior.Takahashi Keisuke and Takahashi Lauren[33]demonstrated that Gaussian mixture models are used to understand the data structure of material databases,while RF is used to predict crystal structures.Zilettiet al.[34]constructed a CNN model to classify structures based on crystal symmetry automatically.They proposed a descriptor called 2D diffraction fingerprint.Zhanget al.[35]utilized bond-valence vector sum (BVVS) and elemental information to describe the complex geometric shapes of perovskites.It has been demonstrated to adapt well to small-sample datasets for prediction without constructing large datasets, and even with just 10 feature descriptors, it can identify space groups and crystal systems.

    By learning the latent representation space of crystal structures, the variational autoencoder (VAE) can generate new structures with features similar to known ones.Bankoet al.[90]utilized the crystal structure representations learned by VAE to reveal latent information, such as structural similarities in texture diffraction patterns.Renet al.[91]introduced a framework for universal reverse design(not limited to a specific set of elements or crystal structures).It features a universal reversible representation encoding of crystals in both real and reciprocal space and attributes structural latent spaces from the VAE.

    The crystal structure can be represented as a crystal structure graph (CSG).GNNs provide a powerful tool for understanding,modeling,and optimizing crystal structures.Xie and Grossman[92]utilized crystal structures for predicting material properties and designing knowledge extraction.The authors proposed a crystal graph convolutional neural network(CGCNN) framework.The crystal structure encodes atomic information and bonds interactions between atoms.Then, a CNN is constructed on the CSG to automatically extract the optimal representation of the target property through training.CGCNN is able to predict eight different properties with high accuracy.

    GNNs provide a learning approach for representing material structures as graph structures.Researchers have proposed GNNs to predict different properties of materials and continuously optimize the structure and performance of the models.Chenet al.[93]developed universal materials graph network(MEGNet) models for predicting the properties of molecules and crystals.Karamadet al.[94]proposed the orbital graph convolutional neural network (OGCNN), which showed the potential to accurately predict the properties of materials such as metals,minerals,and oxides.Park and Wolverton[95]developed an improved variant of the CGCNN model (iCGCNN),which showed a reduction of about 20% in prediction error compared to the original CGCNN model.Choudhary and DeCost[96]presented an atomistic line graph neural network(ALIGNN), which utilizes bond lengths and bond angles as the edge features of the line graph and explicitly and efficiently incorporates the delicate atomic structure information.Lee and Asahi[97]proposed TL using a crystal graph convolutional neural network(TL-CGCNN)to improve prediction accuracy.Wanget al.[98]proposed an attention mechanism-based crystal graph convolutional neural network (A-CGCNN), which enhances the integration of crystal topology and atomic features.

    Table 5 compares these models’ differences in applications and methods.Most of these models are applied to material property prediction,the inputs are mostly CSG,and the methodology draws on GNNs for modeling,with various improvements based on CGCNN.

    Table 5.Process optimization flowchart.

    3.2.Properties research

    One of the primary objectives of ML is to predict the physical and chemical properties of optoelectronic materials.We can predict and understand materials’ optical and electronic properties more accurately, which is crucial for optoelectronic devices, energy conversion, and optical applications.Additionally, research on electronic structure and defects profoundly impacts optoelectronic materials because they directly relate to the material’s optical and electronic properties.[99]By delving into the study of these two aspects of properties,we can highlight the importance of ML in optoelectronic materials research and provide powerful tools and methods for the discovery and performance optimization of new materials.These studies contribute to advancing the forefront of science and provide essential support for innovative technologies and sustainable development.

    3.2.1.Electronic structure

    Bandgap is an essential concept in electronic structure,which describes the energy difference between the valence band and conduction band in a material.By predicting and controlling the bandgap of semiconductor materials,electronic devices such as transistors,solar cells,and photodetectors can be designed for different applications.When researchers use different models to solve similar problems in different studies,comparing the performance and results of these models can reveal commonalities and differences between them.The SVR and GBR models are strong for bandgap prediction.

    Mondalet al.[26]used ML to predict the properties of the bandgap for quaternary III-V semiconductors.The author emphasized that the most widely used ML model in semiconductor material bandgap prediction is SVM.They trained an SVR model to predict the bandgap and utilized the trained SVC model to filter out unreliable data.The authors also provided a comprehensive view of the material’s properties,constructing bandgap phase diagrams of GaAsPSb,allowing researchers to identify which compositional ranges and strain conditions the material may be suitable for optoelectronic devices and which conditions may not.

    Zhuet al.,[100]Huanget al.,[101]and Zhuoet al.[102]used first-principles calculations combined with ML to predict the electronic structure properties of materials.Zhuet al.’s research objective was to predict the bandgap, valence band maximum(VBM),and conduction band minimum(CBM)of two-dimensional materials.Huanget al.aimed to predict nitride materials’bandgap and band offset,and Zhuoet al.preferred to predict the bandgap of inorganic solid materials.We analyze the prediction results of bandgap in these three studies because it is a critical parameter in determining the optoelectronic properties of the materials.Table 6 analyzes the bandgap prediction results of the best models in these three studies,along with their chosen algorithms and features.Missing data indicates that the best model does not apply to this simple feature space.

    Zhuet al.and Huanget al.are very close to each other regarding dataset size.Both of them use the PBE method,which is less computationally expensive,and the HSE method,which is more accurate but more computationally expensive.Still,there are some differences in the specific content and focus of the studies.

    Table 6.Experimental comparison.

    Zhuet al.used three models to predict the bandgap and band positions of 2D semiconductor materials and constructed three different feature sets.Set-I relies on the results of PBE calculations, Set-II is based on the basic properties of the elements only, and Set-III is a combination of Set-I and Set-II.The selection and combination of these feature sets provide diversity in their study.Huanget al.focused on predicting nitride semiconductor materials’ bandgap and band alignment,employ six models,such as SVM and LR,and construct three different feature sets.Set-I contains 18 elemental attributes,Set-II adds the bandgap obtained from PBE calculations as features to Set-I,and Set-III constructs a new 26-dimensional feature space using feature engineering.By comparison, the relative importance of each attribute of SVR-based bandgap prediction is electronegativity, covalent radius, valence, and first ionization energy in descending order.Zhuoet al.trained an ML model using only experimentally measured bandgap and compositional information to quickly and accurately predict the material bandgap.The model predictions are closer to the experimental values than the DFT calculations and can accurately predict the bandgap of inorganic solids.

    The three studies explored the applicability of the modeling algorithm in predicting the material properties and provided valuable experience for subsequent studies.The results show that the SVR model can predict the bandgap of materials more efficiently than the DFT calculation with a certain accuracy.In addition,by comparing the RMSEs of bandgaps of the three articles, we can find that the error of predicting bandgaps is still too high compared with the first two studies,although Zhouet al.have a larger dataset.So,the accuracy of bandgap prediction can be improved by utilizing PBE calculation results and elemental features.

    Similarly, Luet al.,[103]Imet al.,[104]and Maet al.[105]had combined ML with first-principles calculations to discover desired materials rapidly.Luet al.collected bandgap data of reported organic-inorganic perovskites and established a nonlinear mapping between material features and bandgaps using the GBR algorithm.They trained six suitable leadfree candidate materials out of unexplored organic-inorganic perovskites.Further first-principles calculations on these six candidates’ thermal stability, environmental stability, and electronic structures revealed three lead-free hybrid organicinorganic perovskites that meet the design requirements for ideal optoelectronic materials.Imet al.generated a database of hypothetical lead-free halide double perovskites and obtained their enthalpies of formation and bandgaps using firstprinciples calculations.Prediction models for enthalpy of formation and bandgap were built using the GBR algorithm,giving importance scores for each feature.Maet al.acquired geometric and electronic structure data of 2D octahedral oxyhalides as training datasets for ML models.After testing SVR,RF, bagging, and GBR algorithms, the GBR algorithm was chosen to build the prediction model.

    However,there are some differences in feature engineering and research objectives.Luet al.obtained 14 critical features, including structural characteristics and elemental properties,from 30 initial features,thereby discovering stable leadfree hybrid organic-inorganic perovskites for solar cells.Imet al.used 32 features, including octahedral factors and elemental properties,to find lead-free perovskites for solar cells.Maet al.obtained 26 essential features,including stacked octahedral factors and elemental properties from 62 initial features, discovering new two-dimensional oxyhalides for optoelectronic devices.Despite different research goals,the combined computational and GBR for materials discovery is similar, demonstrating the applicability of this approach to discover various types of optoelectronic functional materials.

    In addition, several studies utilized different approaches to predict the band gap.Allamet al.[106]utilized an ANN model to explore the chemical space of possible layered perovskite structures and to screen suitable material candidates as solar cell materials.They found the number of layers, anion oxidation state, and cation ionic radius to be the most critical factors affecting the bandgap.Choubisaet al.[107]proposed the deep adaptive regressive weighted intelligent network(DARWIN)framework to accelerate material discovery.A GCN model was utilized to predict the bandgap.The authors used DARWIN to extract several chemical rules that explain the direct-indirect nature of bandgaps, such as the fact that p-block elements with larger atomic masses are more likely to form direct bandgaps.These rules reconfirmed the patterns reported in the paper and provide new statistically significant chemical patterns.In designing stable UV-emitting direct bandgap materials,the authors found from halide materials that electronegativity differences within an optimal range are an essential feature for predicting the performance of such materials.Based on this criterion, the authors successfully synthesized K2CuCl3and characterized its structure and luminescence properties.The results showed that the emission wavelength of K2CuCl3is lower than 400 nm, which meets the requirements of the designed UV-emitting materials.Firstprinciples calculations also verified the bandgap and luminescence properties.

    Gaoet al.[29]screened four new quaternary semiconductors(Ag2InGaS4,AgZn2InS4,Ag2ZnSnS4, and AgZn2GaS4)with direct bandgap by RF algorithm.Takahashiet al.[28]visualized and analyzed the data and found several implied trends determining the material’s band gap.The authors built an RF model to predict the bandgap of perovskite materials,and 11 previously undiscovered perovskite materials with ideal bandgaps and formation energies were discovered.

    By adjusting the types and concentrations of different doping elements,the electronic structure of material films can be modified,thereby tuning their bandgap.Zhang and Xu[108]developed a GPR model to predict bandgap based on lattice constants and grain size.The authors collected experimental data on 65 doped ZnO thin films,and the model achieved excellent prediction accuracy on bandgap.It can assist in designing and comprehending these multi-doped materials to make them have a tunable bandgap.The GPR model is a powerful regression method for small sample and uncertainty modeling.

    In addition to the bandgap, highest-occupied molecular orbital(HOMO)and lowest-unoccupied molecular orbital(LUMO)also provide crucial information about the material’s electronic structure and energy level distribution.Salehet al.[27]predicted the energy levels of organic semiconductor materials in photodetectors to find new building blocks by mining a photovoltaic database.ML methods such as light gradient boosting machine(LGBM)and GBR model were used to train models for predicting HOMO and LUMO energy levels and power conversion efficiency.Jeonget al.[31]developed a GCN model to predict the HOMO and LUMO energies of molecules.Using TL,they initialized the model with parameters from a previously developed model for predicting spectral properties and then trained it on the HOMO and LUMO energy databases.Comparing from-scratch training to TL,they found the TL model had lower prediction errors.Using this model to assist in designing and preparing deep-blue OLED devices resulted in an external quantum efficiency of 6.58%.The model enabled rapid and efficient prediction of HOMO and LUMO energies to aid in designing optoelectronic materials.

    Studying the properties of quantum wells allows for an in-depth understanding of the electronic structure of semiconductor materials.The ground state energy and effective mass are usually properties closely related to a quantum well’s electronic structure.Da Silva Macedoet al.[109]used ML to rapidly and accurately predict the electronic properties of quantum well semiconductor heterostructures.The authors obtained the eigenstate energies and effective masses for thousands of heterostructure configurations to train an ANN model.The trained model can accurately predict the properties of new heterostructures in milliseconds,saving approximately 1000 times the time it would take to solve the equations directly.This significantly reduces the computational barriers in designing quantum well devices.

    Crystalline materials exhibit excellent electronic and optical properties in optoelectronic devices.Nevertheless,amorphous materials have extensive prospects for specific applications such as flexible electronics and solar cells.[110,111]Deringeret al.[43]revealed the amorphous silicon’s structural and electronic phase transition at the atomic scale under high pressure.Predicted changes in the electronic structure indicate a semiconductor-to-metal transition in amorphous silicon under high pressure.This transition significantly impacts the optoelectronic response of amorphous silicon.Although optoelectronic applications are primarily focused on crystalline silicon,the methods used in this study can be extended to investigate the structural stability and electronic structure of other optoelectronic materials.

    3.2.2.Defects

    Defects can alter crucial properties of materials, such as electronic structure and conductivity.Understanding and controlling defects can assist in optimizing the performance of materials to meet the requirements of various applications.

    Varleyet al.[112]showcased a computationally efficient model capable of accurately predicting the formation energies of cation vacancies and pinpointing the positions of their electronic states in an extensive range of II-VI and III-V materials.Importantly, this model eliminates the need for calculations involving supercells composed of~100 atoms, relying exclusively on parameters derived from the primitive unit cell of bulk materials,typically consisting of 2-4 atoms.

    By using a dataset generated through DFT, Mannodi-Kanakkithodiet al.[113]developed accurate and widely applicable models for predicting impurity properties in cadmiumbased chalcogenides.Researchers initially converted the material systems into numerical descriptors, encompassing impurity element attributes,defect site information,and low-cost single-cell calculated defect attributes.Subsequently, modeling was performed using RF, kernel ridge regression (KRR),and LASSO regression, with their predictive performances compared.The RF model can precisely predict impurity properties in cadmium selenide-sulfide mixtures.Furthermore,including PBE-calculated transition levels as descriptors significantly enhances the accuracy of transition level predictions.

    Freyet al.[114]utilized a pre-trained MEGNet model for TL to predict critical properties of 2D materials to identify ideal host materials for quantum dot emission.They created nearly ten thousand defect structures and trained an ensemble ML model capable of accurately predicting fundamental defect properties without requiring first-principles calculations.The research team identified 100 promising defect structures,including deep center defects suitable for quantum emission and engineered dopant defects for resistive switching.

    Wanget al.[115]investigated the fluctuation of defect energy levels in metal halide perovskites (MHPs) and its effect on material properties.They developed an MLFF trained on DFT data to perform nanosecond molecular dynamics (MD)simulations.They demonstrated that the intrinsic softness and anharmonicity of the MHP lattice lead to large defect level fluctuations.The study provided insights into how dynamic defects can produce desirable MHP properties for solar cells and other optoelectronic applications.Liuet al.[116]studied the structural dynamics of the Σ5(120) grain boundary (GB)in CsPbBr3on the nanosecond timescale and how the structural changes affect the electronic structure and potentially create detrimental mid-gap trap states using MLFF.The authors first constructed a CsPbBr3Σ5(120)GB model and performedab initioDFT calculations to analyze its initial structure and band structure.Based on the DFT data, they trained an MLFF model and used it to perform longer timescale MLbased MD(MLMD)simulations.The authors established correlations between structural and electronic property changes by analyzing the GB’s structural fluctuations and periodic DFT calculations.Figure 5 shows the simulation results based on the MLFF at the nanosecond scale.The article investigated grain boundary defects in CsPbBr3materials, revealing how the structural dynamics of grain boundaries affected the electronic structure and providing guidance for defect repair.

    Fig.5.Results based on MLMD.(a) Evolutions of the potential energy fluctuation and the frontier energy levels in the MLMD trajectory.(b) Evolutions of the energy level differences in the MLMD trajectory and the distribution histograms.(c) LUMO charge densities and Pb configurations at the labeled states.[116]

    3.3.Materials and devices optimization

    The optimization of materials and devices in the field of optoelectronics is essential for advancing technology,improving performance,and addressing the growing demand for efficient and sustainable optoelectronic solutions.

    By predicting device properties,unnecessary device fabrication or expensive experimentation can be avoided, thus providing a more cost-effective route to device optimization.Using physics-assisted ML, Pratiket al.[30]established the relationship between process conditions and device capacitance-voltage (C-V) curves.The models were trained using theoreticalC-Vdata at frequencies of 1 kHz, 3 kHz,5 kHz,10 kHz,50 kHz,and 100 kHz and measuredC-Vdata at 1 kHz and 100 kHz.They were subsequently tested with 3 kHz,5 kHz,10 kHz,and 50 kHz frequencies.The most significant advantage of their data-driven model is that it scales smoothly to any number of process parameters.Glasmannet al.[117]investigated two ML algorithms for theC-Vcharacteristics of InAsSb-based barrier infrared detectors: an ANN was used to predict the capacitance values of the devices,and the results showed that the neural network model had a high generalization ability.As shown in Fig.6,the CNN model is used to predict the doping concentration of the absorber,baffle,and electrical contact layers,as well as the thickness of the baffle layer, using theC-Vcurve as input.The results show that the CNN can accurately extract the structural parameter information from theC-Vcurves.

    Fig.6.The CNN framework for predicting capacitance-voltage characteristics.[117]

    By carefully designing structures, researchers can customize the optoelectronic properties,advancing developments in fields such as solar cells, lasers, photodetectors, and more.Bassman Oftelieet al.[38]constructed a GPR model for predicting material properties of input heterogeneous structures.Additionally, the authors applied an active learning model based on Bayesian optimization(BO),which can discover the optimal heterogeneous structures with minimal first-principles calculations.Through extensive simulation experiments,Eppset al.[42]optimized the AI strategy for autonomous materials synthesis and proposed an efficient ensemble neural network framework based on reinforcement learning to enhance the efficiency of synthesizing quantum dot materials.The algorithm comprehensively considers the global context of the synthesis process rather than just local optimization,enabling a more efficient search for global optimality and significantly improving the success rate.Compared to traditional methods, this algorithm can simultaneously optimize multiple objectives, such as emission peak,full width at half maximum,and more,making material optimization more comprehensive and systematic.The research findings are of significant importance in advancing automated synthesis technology, and the framework and methods have broad application prospects and potential for expansion in the field of automated synthesis.

    Time-series data typically encompass the variation of device performance parameters, such as photocurrent, opencircuit voltage, short-circuit current, etc.RNNs hold significant application value for handling such time-series data.Howardet al.[32]studied the influence of humidity on the luminescent properties of perovskite materials.They conducted dynamic luminescence measurements by controlling relative humidity and established a predictive model using RNN, demonstrating the potential application of ML in predicting the long-term stability of materials.Ramadhanet al.[118]systematically compared physical and ML models’performance in estimating solar irradiance and photovoltaic power.The study selected and compared ML models such as SVR, RF, RNN, gated recurrent unit (GRU), and long shortterm (LSTM) memory with widely used physical models.LSTM and GRU are two commonly used variants of RNNs.They are both designed to address issues such as vanishing and exploding gradients present in traditional RNNs.Figure 7 provides a comprehensive view of the accuracy of the different models in estimating the global horizontal irradiance(GHI), direct normal irradiance (DNI), and global titled irradiance(GTI).LSTM and GRU are better for estimating solar radiation and photovoltaic power.

    Fig.7.Comparison of accuracy between ML and physical models on output variables.[118]

    Battery conversion efficiency can directly impact the performance and availability of batteries.Wasmeret al.[119]used ensemble learning to predict solar cell conversion efficiency and explain the impacts of different features on the predictions.They analyzed the time trends of feature impacts,differentiating which features were more sensitive to short-term efficiency changes versus long-term efficiency levels.For example,the wet bench process significantly influenced the overall efficiency trend, belonging to short-term influencing factors.Wafer base resistivity, emitter sheet resistance, and other inline measurement data had a persistent impact on efficiency but did not vary with time.This study effectively parsed the time trends of solar cell efficiency and its influencing factors through interpretable ML, providing valuable guidance for production process optimization.

    The material’s conductivity can be changed in semiconductor devices by adjusting the dopant and doping ratio.Different dopants and doping ratios can lead to other electronic structures and properties.[120,121]Liet al.[122]predicted the optimal manganese doping concentration to optimize the performance of Mn-doped CMZTSSe solar cells.The authors used GPR to predict the optimal Mn doping concentration of 5%in CMZTSSe solar cells.The experimental results verified this prediction, and a conversion efficiency of 8.9% was obtained for CMZTSSe solar cells based on 5%Mn doping.Further characterization results showed that compared with pure CZTSSe,the main defect of the 5%Mn-doped CZTSSe solar cell has been changed from CuZn inversion to VCu,which is one of the reasons for the improved efficiency.This study once again underscored the reliability of GPR models for small datasets.

    Grau-Luqueet al.[123]analyzed the influence of offstoichiometry in Cu2ZnGeSe4on defect formation and solar cell performance using ML.A combinatorial sample containing around 200 Cu2ZnGeSe4solar cells with different[Zn]/[Ge]compositions was prepared.The samples were characterized by x-ray fluorescence spectroscopy and Raman spectroscopy and correlated with electronic parameters.Furthermore, the Raman spectral data was analyzed using the LDA algorithm.Specifically, all the Raman spectra measured under different excitation conditions for each cell (a total of 588 spectra) were used as inputs.At the same time, efficiency, open-circuit voltage, and [Zn]/[Ge] ratio were set as outputs.Each output target was classified into four groups containing approximately equal data.Figure 8 shows the results of ML analysis of Raman spectral data for different classification targets.The training and test scores were close for the efficiency and open-circuit voltage targets.Solar cells with the highest efficiency(around 6.3%)were obtained when 1.05<[Zn]/[Ge]<1.15.This range was considered as the optimal cationic composition.The study demonstrated that solar cell efficiency can be effectively predicted solely based on the Raman spectral and compositional data.

    Fig.8.Results of the ML analysis of the Raman spectroscopic data for different classification targets: (a)efficiency,(b)Voc,and(c)[Zn]/[Ge]ratio.[123]

    Quantum efficiency is an essential tool for assessing and optimizing the performance of photonic devices.DL provides strong support in designing highly quantum-efficient devices.G′omez-Bombarelliet al.[23]designed and synthesized a series of high-performance organic LED emitter materials.The researchers first conducted large-scale quantum chemistry calculations using DFT to systematically predict the optical properties of over one million emissive molecules.They then applied DL methods to prescreen the computational results and identified the most promising candidates.These molecules achieved high maximum external quantum efficiencies of up to 22%after synthesis and characterization.

    Zhanget al.[124]developed an enhanced molecular information model based on DL to accurately predict the performance of perovskite LEDs using a small dataset of additive molecules.Using a predicted new additive,they achieved a high external quantum efficiency of 22.7% for perovskite LEDs,significantly higher than the previous best of 22%.This work demonstrated the power of DL for efficient molecular screening with limited data, providing new insights to boost the performance of emerging optoelectronic devices.

    The data-driven approach based on real-world data,multidimensional data collection, and high-precision ML models can potentially support device optimization.Anshulet al.[125]tracked the maximum power of photovoltaic panel systems.The ensemble learning algorithm achieved the highest prediction accuracy of 98%.Wanget al.[126]significantly enhanced the performance of filter-free microspectrometers based on perovskite materials, providing important references for integrated microspectrometry systems.

    Vandermauseet al.[40]developed an interpretable GPR model and an efficient active learning method, enabling the rapid training of accurate machine-learned force fields with minimal first-principles computational data.They introduced a novel active learning approach named fast learning of atomistic rare events(FLARE),which successfully trained accurate potential energy fields for multiple materials.The obtained machine-learned potential energy fields exhibited a 1000-fold acceleration compared to first-principles calculations while maintaining accurate physical descriptions.In this study, the authors demonstrated that the noise uncertainty of the GPR model accurately reflects baseline model errors, while the norm uncertainty assesses model extrapolation errors.The GPR model can actively select new training data points during molecular dynamics simulations using the FLARE method based on its own uncertainty estimates.This article lacks specific cases, but the adaptability and efficiency demonstrated by this method lead me to believe that it is highly suitable for application in the study of interfaces and surfaces of optoelectronic materials,providing a precise and efficient potential energy field description.

    3.4.Material characterization

    ML models can handle various types of data related to optoelectronic materials, including but not limited to images,spectral data, and electron microscopy imaging.This versatility in processing multimodal data empowers ML with remarkable flexibility and broad applicability in optoelectronic materials.

    Using CNNs to recognize and classify material films and defects is a reasonable choice.Stern and Schellenberger[127]used training to test a fully convolutional network (FCN)for the defect classification of LED chips.The researchers demonstrated significant success using the DL model, particularly fully convolutional network (FCN), for LED chip defect classification tasks.Through techniques such as data augmentation,pretrained CNNs,improved connectivity mechanisms, and the application of weighted loss functions, researchers substantially enhanced the accuracy of defect classification.Furthermore, incorporating additional measurement values into the analysis further improved performance.These findings are significant for automated defect detection and quality control applications.

    Imotoet al.[36]developed a method to automatically classify semiconductor process defect images using CNN and TL to reduce the expensive manual classification workload.The method adopts a two-stage TL strategy: firstly,a large amount of noisy labeled data is used for network pre-training,and then a tiny amount of accurately labeled data is used for model fine-tuning.Even with minimal training data,the method can achieve high classification accuracy.The results show that the method can accurately classify the main defect types and reduce the manual inspection workload by nearly 2/3,fully utilizing the advantages of TL in the case of insufficient training data and effectively reducing the labor cost of the semiconductor manufacturing process.

    GAN can generate images useful for data scarcity in optoelectronic materials research.This approach is expected to improve the performance of DL models significantly and has potential for a wide range of applications.Vakhariaet al.[128]employed ML techniques to analyze and predict surface properties of hybrid perovskite films.Initially,they utilized a GAN model known as SinGAN to generate multiple morphological images at varying scales from a single image.Subsequently, they extracted ten image quality assessment features from these generated images.To determine the most relevant feature subsets, they applied a heat transfer search algorithm for feature optimization.Finally,these selected feature subsets were input into four ML models: DT,SVM,KNN,and ANN for recognizing the morphology of different perovskite films.The findings demonstrated that combining optimized features with ANN yielded the highest prediction accuracy compared to other combinations.This study underscored the potential of using GAN-based data augmentation and ML models to recognize film morphology and defects in semiconductor manufacturing automatically.

    Fig.9.Prediction of electrical conductivity using CNN.Panels(a),(b)and(c)show six DF example images representing low,medium,and high conductivity values.Panel(d)shows the predicted values compared with the true value.[131]

    CNNs can also be applied to analyze properties, structures, and components.Chenet al.[129]developed a CNN model to establish correlations between the UV-vis absorption spectra and chemical compositions of gold nanoclusters, enabling the identification of nanocluster compositions from featureless absorption spectra.Wanget al.[130]demonstrated the feasibility of CNN-based TEM image analysis for the structural control of 1T-phase CrTe2.The approaches in this study could be extended to the structural control of other 1T or 2H phase 2D materials, providing vital support for emerging 2D material optoelectronic devices.

    Srivastavaet al.[131]applied fundamental ML models to address various problems, such as time-series photoluminescence intensity prediction, extrapolative device power output prediction, and extracting conductivity information from images.They proposed a holistic, integrated ML pipeline concept encompassing multiple stages of perovskite solar cell development,from component screening to material fabrication and device testing.This allowed the establishment of feedback loops between different stages, enabling end-to-end learning and optimization from material design to device enhancement.

    We will use the prediction of conductivity as an example.They constructed a CNN model using dark field(DF)images.The researchers divided the DF image data of 210 spiro-OMeTAD films into training,test,and validation sets in the ratio of 16:5:4.Seven conductance measurements were acquired for each image at different locations usingI-Vdata.Then,all measurements for each image were averaged and normalized.Data enhancement (rotation, scaling) was performed during the training process to obtain a model that predicts the conductance values of individual images.In Fig.9,they compared the conductivity predicted using the CNN model with the actual value,yielding a correlation between the visible DF image and the conductivity.

    3.5.Process optimization

    Optoelectronic material processes typically involve tuning multiple parameters and variables with complex interrelationships.[132,133]ML can process high-dimensional data and find complex correlations between parameters to optimize the process.

    Miyagawaet al.[134]optimized the process conditions of TiOx/SiOy/c-Si heterostructures in hydrogen plasma treatment(HPT) using the BO method to improve their surface interface passivation properties.The researchers first deposited 3 nm TiOx films on c-Si substrates, and some of the samples were oxidized by H2O2to form a SiOyinterlayer before TiOxdeposition.Then, the team performed HPT on the samples,which involved six variable parameters: process temperature(THPT), process time (tHPT), hydrogen pressure (pH2), hydrogen flow rate (RH2), radio frequency power (PRF), and electrode distance (d).To optimize the process conditions for HPT,they first conducted ten sets of randomized experiments to obtain initial data,predicted the best combination of parameters based on a GPR model, and repeated the optimization cycle until the global optimum was found.The results showed that the six parameter combinations of HPT were optimized through only 15 rounds of optimization cycles,and the carrier lifetime was increased to more than three times than that of the untreated one.This study proved that BO is a practical method for efficient optimization of multidimensional parameters,which provides valuable inspiration for preparing highefficiency silicon heterojunction solar cells using TiOx.

    Fig.10.Process optimization flowchart.

    Weiet al.[135]investigated the use of ML to optimize the chemical deposition method for preparing p-type transparent conductive materials Cu-Zn-S thin films.The researchers strategically designed two rounds of orthogonal experiments:the first round examined the effects of four parameters, and the results of the first round guided the second round to optimize the parameters further.Based on the first round of experiments,the authors built an SVR model to obtain a predictive mapping map in the parameter space.The SVR model was built again for training and prediction to guide parameter selection in the second round of experiments.The results show that the optimized Cu-Zn-S thin film can reach a conductivity of 430 S/cm with an average visible light transmittance of 80%,and the efficiency of the p-n heterojunction optoelectronic device it prepares is two times higher than that of the unoptimized film.This study demonstrates that ML optimization combined with the orthogonal design of experiments can effectively find the optimal parameters for preparing highperformance p-type transparent conductive CuZnS thin films and improve the efficiency of material and device development.This idea of multi-round optimization is worth learning and applying, and we have established the flow chart of process optimization in Fig.10.The method can be extended to optimize other material synthesis processes.

    ML can control parameters such as deposition rate,temperature,and gas flow rate by optimizing and adjusting the parameters of the fabrication process, and the optimization and adjustment of these parameters can improve the properties of the thin film layer and make the semiconductor device performance more reliable.Jinet al.[136]optimized a showerhead design for semiconductor manufacturing to improve airflow uniformity.The authors developed a CAD showerhead parameter model with 30 design variables,including the stem,back plate, porous baffle, and panel dimensions.The main design variables were the hole diameter distribution on the panel and the fillet radius of the stem.The authors collected data through CFD simulation and established the relationship between design variables and flow uniformity using linear regression and GPR models.Through sensitivity analysis,the authors found that the hole diameters in the inner regions and the second fillet radius significantly affected the airflow uniformity.The authors then used genetic algorithm (GA), stochastic gradient descent (SGD), and BO algorithms to optimize the panel design.Like conventional optimization algorithms, GA and SGD required a large amount of data and ran independently.Both algorithms found the optimal design,improving the airflow uniformity by 5.7%compared to the baseline.Based on Gaussian process modeling and acquisition functions,the BO algorithm utilized CFD feedback.It only needed one tenth of the baseline design data and improved the airflow uniformity by 9.6% compared to the baseline.The computational cost was reduced by 10 times.

    However, when considering the fillet radius design variables, the performance of BO was inferior to GA and SGD algorithms.The main reasons were: first, the BO algorithm relies on the GPR model, and its fitting accuracy may not be as high as other models trained on large datasets.Second, the fillet radius variables had local optima, which BO was more prone to get trapped in.Finally, the large range of the fillet radius variables resulted in an overly large search space, which was also disadvantageous for BO.In comparison, GA and SGD were more capable of handling complex and high-dimensional design spaces through multi-round iterative search.

    GA has shown potential for application in the optimal design of optoelectronic materials.For example, Zhuet al.[137]used a GA to optimize the design of III-V nitride lightemitting diodes and balanced luminous efficiency and operating voltage.Patraet al.[138]established a model based on a GA to predict and optimize the defect configurations in monolayer MoS2,which improves the optoelectronic conversion efficiency.The BO algorithm excels at tackling highly noisy and non-convex complex problems, making it applicable to optimizing the structure and composition of optoelectronic materials.SGD,on the other hand,can be utilized for training largescale DL models to predict the performance of optoelectronic materials.These algorithms can be employed individually or in combination to optimize optoelectronic materials cooperatively.They hold great promise in advancing the design and optimization of optoelectronic materials,thereby contributing to the further development of optoelectronic technologies.

    We summarize representative ML methods in the last six years in Fig.11, the application and development of which are expected to bring breakthroughs in optoelectronic materials.They improve the performance and efficiency of optoelectronic materials and expand the functions and applications of optoelectronic materials, providing strong support for developing a new generation of high-performance, low-cost, environmentally friendly,and sustainable optoelectronic technologies.

    Figure 11(a) breaks through the computational barriers of DFT, making first-principles methods infinitely scalable.It represents a milestone in materials science.Figure 11(b)employs DL to predict complex spectra from composition and structure, avoiding the cumbersome first-principles calculations.This model is vital for advancing various functional materials,including optoelectronics.Figure 11(c),employing NLP techniques such as word embeddings,has been used for the first time to analyze scientific literature,extracting knowledge in materials science.This method has provided a fresh perspective for materials science research, and its framework can be extended to predict various functional materials and applied to paper in other scientific domains.Figure 11(d)has surpassed various limitations of empirical potentials and, relying on high precision, can be widely applied to crystalline,defective,liquid,and interfacial materials.They have opened up new frontiers in atomic simulation and provide robust support for theoretical exploration and optimization of optoelectronic materials.Zuoet al.[142]comprehensively evaluates the current major ML-IAPs,provides insights on the tradeoffs between accuracy, training data requirements, and computational cost,and provides benchmarks and guidance for future ML-IAPs development and applications.Figure 11(e) has pioneered a new field in crystal structure prediction and understanding using DL.It is a significant milestone in developing materials representation learning and GNNs.Figure 11(f)has introduced a new approach to materials reverse engineering.It generates structural parameters based on given spectral responses, representing an emerging research area in applying optoelectronic materials.Figure 11(g), utilizing DL for efficient first-principles calculations has significantly reduced the time required compared to traditional computational methods.It is one of the essential achievements in developing optoelectronic materials.

    Fig.11.Representative machine learning methods.(a) The materials learning algorithms (MALA) framework.[139] (b) The Mat2Spec model architecture.[140] (c) Corpus training process.[141] (d) ML-IAP development workflow.[142] (e) The CGCNN model.[92] (f) The inverse design network(IDN).[143] (g)An overview of the JARVIS infrastructure.[144]

    4.Conclusion and perspectives

    We have outlined the general process of applying ML to optoelectronic materials.We have discussed standard methods for data collection and feature engineering in optoelectronic materials and have addressed the issue of overfitting.Our main focus has been on the applications of ML in optoelectronic materials.Through discussions of various research works,we have examined the impact of feature selection on research outcomes and summarized the applicability of different ML models in studying photonic materials and device properties.SVM and ensemble learning methods like GBR demonstrate exceptional performance in handling high-dimensional sparse data, making them invaluable for predicting material properties.The GPR model performs exceptionally well with small datasets, such as predicting band gaps after doping.RNNs are well-suited for managing time series data,such as assessing the stability of solar cells and estimating the lifetime of excited states.GNNs excel at modeling the intricate relationships between material structure and properties, encompassing electronic band structures,conductivity,and optical properties.CNNs are widely employed for defect detection and property prediction tasks.Reinforcement learning is effectively applied to optimize materials and device performance.TL proves its worth in scenarios where data is limited.Active learning offers an efficient means to reduce the search space and experimental costs through iterative feedback,making it a valuable tool for process optimization.ML-IAPs and MLFFs are pivotal in investigating molecular systems’properties, reactions,and behaviors.

    ML has excellent potential in optoelectronic materials.However, the challenges in data requirements, transferability,and feature selection must be overcome to better apply them to real-world materials research and design.

    (i) The main goal of optoelectronic databases is to collect, store, manage, and share various data, including experimental, computational, and literature data.Researchers can utilize NLP techniques to search for the required information.One of the goals of the materials and genomics initiative is to unify the materials innovation infrastructure (MII) by extending its scope to include properties, synthesis, processing,and fabrication methods of optoelectronic materials to build a comprehensive data infrastructure.

    (ii) Property research is the focus of optoelectronic materials, and we take defect research as an example, where we first build a defect database containing structural and chemical information about defects and extract features based on this information.Hand-designed features can utilize domain knowledge and a priori information and apply them to semiconductor domains and tasks.However,automatically generated features can handle large-scale and complex datasets.Therefore,we can use automatic feature selection and dimensionality reduction techniques to simplify the feature space and discover more data patterns and relationships.Then, we can train different models,such as regression algorithms to predict defect properties and classification algorithms to predict defect energy level locations.According to the prediction results of the model,combined with the material’s properties,we can screen out the eligible candidates.

    (iii) According to the no free lunch theorem,[145]algorithm A,outperforming algorithm B on a specific dataset,must be accompanied by algorithm A’s inferior performance to algorithm B in another dataset.This theorem reminds us to consider the particular material system and performance requirements when choosing an ML algorithm.Different algorithms may be better adapted to different types of materials and properties.Therefore, we must take a cautious and experimentally evaluated approach to determine which algorithm performs best in a given materials science problem.The ability of machines to actively collect and analyze data through active learning will be one of the future directions of ML for photovoltaic materials.Labeled data is often time-consuming and costly compared to unlabeled data,which is relatively easy to access, making semi-supervised learning potential for a wide range of applications in semiconductor materials science.Applying already trained models or knowledge to optoelectronic materials through TL across material systems can reduce the need for new data and increase the generalizability and adaptability of models.

    With the advent of new supercomputers,[146]the development of more powerful computational capabilities and innovative algorithms will allow for the simulation of more realistic material models,which will be necessary for the discovery of tailored materials as well as for the prediction of material behaviors,and the concept of self-driving labs,[147]which combine ML, laboratory automation, and robotics to accelerate the discovery of new materials and molecules.We can expect more innovations and breakthroughs in optoelectronic materials from these emerging technologies.

    Acknowledgments

    Project supported by the National Natural Science Foundation of China (Grant No.61601198) and the University of Jinan PhD Foundation(Grant No.XBS1714).

    伊川县| 沂水县| 琼海市| 那坡县| 孙吴县| 梨树县| 凯里市| 苏州市| 南岸区| 于都县| 始兴县| 奉贤区| 安顺市| 卢湾区| 新蔡县| 中卫市| 泾川县| 丰镇市| 高邑县| 山阴县| 自治县| 同江市| 临桂县| 福海县| 仁寿县| 通城县| 白城市| 资溪县| 台州市| 清镇市| 临西县| 和静县| 安新县| 怀安县| 南投市| 清水河县| 原平市| 定襄县| 通道| 彩票| 张北县|