There are far more microorganisms in the natural environment than the number of microorganisms currently cultivated. Only about 1% of microorganisms in nature have achieved isolation and pure culture. The remaining 99% of microorganisms that have not been isolated and cultured are still a huge treasure house that needs to be developed urgently.
The metagenomic sequencing technology developed in recent years can obtain genome information of all microorganisms in the environment, and have the ability to assemble a nearly complete genome of prokaryotes has not yet been isolated and cultivated. The development of this technology has given birth to the prediction of the substrate metabolism capacity of uncultured microorganisms based on genomic information, and at the same time, the design of a personalized medium will make uncultured microorganisms cultivable.
The growth of microorganism requires energy, nutrition and appropriate physical and chemical components. The adaptability of microorganisms in the environment to temperature varies greatly. Providing a suitable culture temperature is one of the key factors for the separation and culture of uncultivated bacteria. When separating and cultivating microorganisms from the environment, researchers usually choose the temperature of the separation environment as the temperature during cultivation, but the optimum growth temperature of the target prokaryotes is not consistent with the environmental temperature.
Providing a suitable culture temperature for the isolation culture experiment, according to the optimal growth temperature of target prokaryotes, can improve the efficiency of the isolation culture. The rapid and simple judgment of the optimal growth temperature of microorganisms based on the genome sequence of microorganisms has been a hot spot for bioinformatics workers in the field of microorganisms in recent years. Here, we propose a Kmer distribution-based deep learning method that uses only genome sequences to predict the growth temperature of prokaryotes. This method does not need to annotate the genome, and has the characteristics of simple operation, fast prediction speed and high accuracy.