Abstract:Dynamic spectrum access is deemed as an effective solution to the radio spectrum scarcity and spectrum usage in efficiency problem, which allows secondary users to access the spectrum dynamically for data transmission when the licensed spectrum is idle. However, spectrum sensing is one of the key challenges for dynamic spectrum access. Since the secondary user was equipped with limited sensing capability, in order to obtain more spectrum access opportunities, the spectrum sensing order problem was investigated to find the frequency band with the highest probability of being idle as soon as possible. Considering that the probability of the spectrum being idle was not available for the secondary users and changes over time, an online learning framework in which the spectrum sensing order problem was formulated as a classical multi-armed bandit problem was proposed, and it was addressed by using an online learning method, referred to as satisficing discounted Thompson sampling. Simulation results indicate that compared with other algorithms, the proposed algorithm yields more spectrum opportunities and can track the changes of the probability of the spectrum being idle.