Languages–independent text tokenization can aid in classification of languages with few sources.There is a global research effort to generate text classification for any language.Human text classification is a slow p...Languages–independent text tokenization can aid in classification of languages with few sources.There is a global research effort to generate text classification for any language.Human text classification is a slow procedure.Conse-quently,the text summary generation of different languages,using machine text classification,has been considered in recent years.There is no research on the machine text classification for many languages such as Czech,Rome,Urdu.This research proposes a cross-language text tokenization model using a Transformer technique.The proposed Transformer employs an encoder that has ten layers with self-attention encoding and a feedforward sublayer.This model improves the efficiency of text classification by providing a draft text classification for a number of documents.We also propose a novel Sub-Word tokenization model with frequent vocabulary usage in the documents.The Sub-Word Byte-Pair Tokenization technique(SBPT)utilizes the sharing of the vocabulary of one sentence with other sentences.The Sub-Word tokenization model enhances the performance of other Sub-Word tokenization models such pair encoding model by+10%using precision metric.展开更多
Digital assets have been introduced to the global market as one of the innovations with the potential.Even though their impact on the traditional economy is impossible to measure.Security tokens(ST)are the ones that s...Digital assets have been introduced to the global market as one of the innovations with the potential.Even though their impact on the traditional economy is impossible to measure.Security tokens(ST)are the ones that stand out due to the preference they have from producers and consumers.The former obtains financial resources efficiently for their specific projects.While the latter look for STs in global digital platforms of trust and security.Which are regulated by public securities sales offices.The research proposes a method under the fuzzy logic theory and its applied models.It highlights the use of the triangular fuzzy numbers,the Fuzzy Delfi,Expertons,Hamming Distance,and the fuzzy inference system(FIS).The benefits and limitations of the proposal were highlighted when the proposal was used in an agro-export company.The route or algorithm of the value system to be followed in the execution of the investments stands out.Therefore,the research fulfills its objective and is very useful for small and medium export 4.0 companies.Since they are eager to obtain cash flow to improve their technical efficiency and to be able to export their artifacts to global markets.That is to say,the producer of goods can obtain an unprecedented benefit in an agile and efficient way in the context of Industry 4.0.展开更多
The breeding of high-yield wheat varieties is needed to ensure food security.Accurately and rapidly predicting wheat yield at the plot level via UAVs would enable breeders to identify meaningful genotypic variations a...The breeding of high-yield wheat varieties is needed to ensure food security.Accurately and rapidly predicting wheat yield at the plot level via UAVs would enable breeders to identify meaningful genotypic variations and select superior lines,thus accelerating the selection of climate-adapted high-yield varieties.Although current prediction models have already utilized multivariate time series data,these models usually adopt a simple concatenation operation to embed all the raw data,resulting in low prediction accuracy.To address these limi-tations,we propose an improved transformer-based wheat yield prediction model with a variate-independent tokenization approach.The proposed variate-independent tokenization approach facilitates the embedding of 14 vegetation indices and 28 morphological traits via the feature dimension,enabling the learning of variate-centric representations.We also apply a multivariate attention mechanism to evaluate the contribution of each variate and capture the multivariate correlation.Extensive experiments are conducted to verify the effectiveness of our model,including comparisons across 3 nitrogen treatments,2 years,and 56 wheat varieties.We also compare our model with state-of-the-art approaches.The experimental results indicate that our model achieves the optimal prediction performance,with an R^(2) of 0.862,surpassing those of the classical recurrent neural network and transformer variants.We also confirm that combining both the vegetation indices and morphological traits is advantageous over using single-source data for the prediction task,achieving an approximately 4%prediction performance gain.In conclusion,this study provides a novel approach for utilizing an improved transformer model and multivariate time series data to quantitatively predict plot-level wheat yield,thus enabling the rapid selection of high-yield varieties for breeding.展开更多
针对地图综合中建筑多边形化简方法依赖人工规则、自动化程度低且难以利用已有化简成果的问题,本文提出了一种基于Transformer机制的建筑多边形化简模型。该模型首先把建筑多边形映射至一定范围的网格空间,将建筑多边形的坐标串表达为...针对地图综合中建筑多边形化简方法依赖人工规则、自动化程度低且难以利用已有化简成果的问题,本文提出了一种基于Transformer机制的建筑多边形化简模型。该模型首先把建筑多边形映射至一定范围的网格空间,将建筑多边形的坐标串表达为网格序列,从而获取建筑多边形化简前后的Token序列,构建出建筑多边形化简样本对数据;随后采用Transformer架构建立模型,基于样本数据利用模型的掩码自注意力机制学习点序列之间的依赖关系,最终逐点生成新的简化多边形,从而实现建筑多边形的化简。在训练过程中,模型使用结构化的样本数据,设计了忽略特定索引的交叉熵损失函数以提升化简质量。试验设计包括主试验与泛化验证两部分。主试验基于洛杉矶1∶2000建筑数据集,分别采用0.2、0.3和0.5 mm 3种网格尺寸对多边形进行编码,实现了目标比例尺为1∶5000与1∶10000的化简。试验结果表明,在0.3 mm的网格尺寸下模型性能最优,验证集上的化简结果与人工标注的一致率超过92.0%,且针对北京部分区域的建筑多边形数据的泛化试验验证了模型的迁移能力;与LSTM模型的对比分析显示,在参数规模相近的条件下,LSTM模型无法形成有效收敛,并生成可用结果。本文证实了Transformer在处理空间几何序列任务中的潜力,且能够有效复用已有化简样本,为智能建筑多边形化简提供了具有工程实用价值的途径。展开更多
With the increasing growth of online news,fake electronic news detection has become one of the most important paradigms of modern research.Traditional electronic news detection techniques are generally based on contex...With the increasing growth of online news,fake electronic news detection has become one of the most important paradigms of modern research.Traditional electronic news detection techniques are generally based on contextual understanding,sequential dependencies,and/or data imbalance.This makes distinction between genuine and fabricated news a challenging task.To address this problem,we propose a novel hybrid architecture,T5-SA-LSTM,which synergistically integrates the T5 Transformer for semantically rich contextual embedding with the Self-Attentionenhanced(SA)Long Short-Term Memory(LSTM).The LSTM is trained using the Adam optimizer,which provides faster and more stable convergence compared to the Stochastic Gradient Descend(SGD)and Root Mean Square Propagation(RMSProp).The WELFake and FakeNewsPrediction datasets are used,which consist of labeled news articles having fake and real news samples.Tokenization and Synthetic Minority Over-sampling Technique(SMOTE)methods are used for data preprocessing to ensure linguistic normalization and class imbalance.The incorporation of the Self-Attention(SA)mechanism enables the model to highlight critical words and phrases,thereby enhancing predictive accuracy.The proposed model is evaluated using accuracy,precision,recall(sensitivity),and F1-score as performance metrics.The model achieved 99%accuracy on the WELFake dataset and 96.5%accuracy on the FakeNewsPrediction dataset.It outperformed the competitive schemes such as T5-SA-LSTM(RMSProp),T5-SA-LSTM(SGD)and some other models.展开更多
The infrastructure finance gap has long-standing implications for economic and social development.Owing to low efficiency,high transaction costs,and long transaction time,conventional infrastructure financing instrume...The infrastructure finance gap has long-standing implications for economic and social development.Owing to low efficiency,high transaction costs,and long transaction time,conventional infrastructure financing instruments are considered to be major contributors to the increasing mismatch between the need for infrastructure development and available financing.Implemented through smart contracts,blockchain tokenization has shown characteristics that are poised to change the capital stack of infrastructure investment.This study analyzed the first SEC-compliant energy asset security token,Ziyen-Coin,from the perspective of the key participants,relevant regulations,and token offering procedures.Results show that tokenization can improve infrastructure assets liquidity,transaction efficiency,and transparency across intermediaries.Conventional infrastructure financing instruments were compared with blockchain tokenization by reviewing the literature on infrastructure finance.The benefits and barriers of tokenizing infrastructure assets were thoroughly discussed to devise ways of improving infrastructure financing.The study also found that the potential of tokenization has not yet been fully realized because of the limited technical infrastructures,regulation uncertainties,volatilities in the token market,and absence of the public sector.This study contributes to the present understanding of how blockchain technology can be implemented in infrastructure finance and the role of tokenization in the structure of public-private partnership and project finance.展开更多
Nonfungible tokens(NFTs)have become highly sought-after assets in recent years,exhibiting potential for profitability and hedging.The large and lucrative NFT market has attracted both practitioners and researchers to ...Nonfungible tokens(NFTs)have become highly sought-after assets in recent years,exhibiting potential for profitability and hedging.The large and lucrative NFT market has attracted both practitioners and researchers to develop NFT price-prediction models.However,the extant models have some weaknesses in terms of model comprehensiveness and operational convenience.To address these research gaps,we propose a multimodal end-to-end interpretable deep learning(MEID)framework for NFT investment.Our model integrates visual features,textual descriptions,transaction indicators,and historical price time series by leveraging the advantages of convolutional neural networks(CNNs),adopts integrated gradient(IG)to improve interpretability,and designs a built-in financial evaluation mechanism to generate not only the predicted price category but also the recommended purchase level.The experimental results demonstrate that the proposed MEID framework has excellent properties in terms of the evaluation metrics.The proposed MEID framework could help investors identify market opportunities and help NFT transaction platforms design smart investment tools and improve transaction volume.展开更多
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R113),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Languages–independent text tokenization can aid in classification of languages with few sources.There is a global research effort to generate text classification for any language.Human text classification is a slow procedure.Conse-quently,the text summary generation of different languages,using machine text classification,has been considered in recent years.There is no research on the machine text classification for many languages such as Czech,Rome,Urdu.This research proposes a cross-language text tokenization model using a Transformer technique.The proposed Transformer employs an encoder that has ten layers with self-attention encoding and a feedforward sublayer.This model improves the efficiency of text classification by providing a draft text classification for a number of documents.We also propose a novel Sub-Word tokenization model with frequent vocabulary usage in the documents.The Sub-Word Byte-Pair Tokenization technique(SBPT)utilizes the sharing of the vocabulary of one sentence with other sentences.The Sub-Word tokenization model enhances the performance of other Sub-Word tokenization models such pair encoding model by+10%using precision metric.
文摘Digital assets have been introduced to the global market as one of the innovations with the potential.Even though their impact on the traditional economy is impossible to measure.Security tokens(ST)are the ones that stand out due to the preference they have from producers and consumers.The former obtains financial resources efficiently for their specific projects.While the latter look for STs in global digital platforms of trust and security.Which are regulated by public securities sales offices.The research proposes a method under the fuzzy logic theory and its applied models.It highlights the use of the triangular fuzzy numbers,the Fuzzy Delfi,Expertons,Hamming Distance,and the fuzzy inference system(FIS).The benefits and limitations of the proposal were highlighted when the proposal was used in an agro-export company.The route or algorithm of the value system to be followed in the execution of the investments stands out.Therefore,the research fulfills its objective and is very useful for small and medium export 4.0 companies.Since they are eager to obtain cash flow to improve their technical efficiency and to be able to export their artifacts to global markets.That is to say,the producer of goods can obtain an unprecedented benefit in an agile and efficient way in the context of Industry 4.0.
基金This work was supported by the Natural Science Foundation of Jiangsu Province(Grant No.BK20231004)the National Natural Science Foundation of China(Grant No.32401697)the National Key Research and Development Program of China(2022YFE0116200).
文摘The breeding of high-yield wheat varieties is needed to ensure food security.Accurately and rapidly predicting wheat yield at the plot level via UAVs would enable breeders to identify meaningful genotypic variations and select superior lines,thus accelerating the selection of climate-adapted high-yield varieties.Although current prediction models have already utilized multivariate time series data,these models usually adopt a simple concatenation operation to embed all the raw data,resulting in low prediction accuracy.To address these limi-tations,we propose an improved transformer-based wheat yield prediction model with a variate-independent tokenization approach.The proposed variate-independent tokenization approach facilitates the embedding of 14 vegetation indices and 28 morphological traits via the feature dimension,enabling the learning of variate-centric representations.We also apply a multivariate attention mechanism to evaluate the contribution of each variate and capture the multivariate correlation.Extensive experiments are conducted to verify the effectiveness of our model,including comparisons across 3 nitrogen treatments,2 years,and 56 wheat varieties.We also compare our model with state-of-the-art approaches.The experimental results indicate that our model achieves the optimal prediction performance,with an R^(2) of 0.862,surpassing those of the classical recurrent neural network and transformer variants.We also confirm that combining both the vegetation indices and morphological traits is advantageous over using single-source data for the prediction task,achieving an approximately 4%prediction performance gain.In conclusion,this study provides a novel approach for utilizing an improved transformer model and multivariate time series data to quantitatively predict plot-level wheat yield,thus enabling the rapid selection of high-yield varieties for breeding.
文摘针对地图综合中建筑多边形化简方法依赖人工规则、自动化程度低且难以利用已有化简成果的问题,本文提出了一种基于Transformer机制的建筑多边形化简模型。该模型首先把建筑多边形映射至一定范围的网格空间,将建筑多边形的坐标串表达为网格序列,从而获取建筑多边形化简前后的Token序列,构建出建筑多边形化简样本对数据;随后采用Transformer架构建立模型,基于样本数据利用模型的掩码自注意力机制学习点序列之间的依赖关系,最终逐点生成新的简化多边形,从而实现建筑多边形的化简。在训练过程中,模型使用结构化的样本数据,设计了忽略特定索引的交叉熵损失函数以提升化简质量。试验设计包括主试验与泛化验证两部分。主试验基于洛杉矶1∶2000建筑数据集,分别采用0.2、0.3和0.5 mm 3种网格尺寸对多边形进行编码,实现了目标比例尺为1∶5000与1∶10000的化简。试验结果表明,在0.3 mm的网格尺寸下模型性能最优,验证集上的化简结果与人工标注的一致率超过92.0%,且针对北京部分区域的建筑多边形数据的泛化试验验证了模型的迁移能力;与LSTM模型的对比分析显示,在参数规模相近的条件下,LSTM模型无法形成有效收敛,并生成可用结果。本文证实了Transformer在处理空间几何序列任务中的潜力,且能够有效复用已有化简样本,为智能建筑多边形化简提供了具有工程实用价值的途径。
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R195)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘With the increasing growth of online news,fake electronic news detection has become one of the most important paradigms of modern research.Traditional electronic news detection techniques are generally based on contextual understanding,sequential dependencies,and/or data imbalance.This makes distinction between genuine and fabricated news a challenging task.To address this problem,we propose a novel hybrid architecture,T5-SA-LSTM,which synergistically integrates the T5 Transformer for semantically rich contextual embedding with the Self-Attentionenhanced(SA)Long Short-Term Memory(LSTM).The LSTM is trained using the Adam optimizer,which provides faster and more stable convergence compared to the Stochastic Gradient Descend(SGD)and Root Mean Square Propagation(RMSProp).The WELFake and FakeNewsPrediction datasets are used,which consist of labeled news articles having fake and real news samples.Tokenization and Synthetic Minority Over-sampling Technique(SMOTE)methods are used for data preprocessing to ensure linguistic normalization and class imbalance.The incorporation of the Self-Attention(SA)mechanism enables the model to highlight critical words and phrases,thereby enhancing predictive accuracy.The proposed model is evaluated using accuracy,precision,recall(sensitivity),and F1-score as performance metrics.The model achieved 99%accuracy on the WELFake dataset and 96.5%accuracy on the FakeNewsPrediction dataset.It outperformed the competitive schemes such as T5-SA-LSTM(RMSProp),T5-SA-LSTM(SGD)and some other models.
文摘The infrastructure finance gap has long-standing implications for economic and social development.Owing to low efficiency,high transaction costs,and long transaction time,conventional infrastructure financing instruments are considered to be major contributors to the increasing mismatch between the need for infrastructure development and available financing.Implemented through smart contracts,blockchain tokenization has shown characteristics that are poised to change the capital stack of infrastructure investment.This study analyzed the first SEC-compliant energy asset security token,Ziyen-Coin,from the perspective of the key participants,relevant regulations,and token offering procedures.Results show that tokenization can improve infrastructure assets liquidity,transaction efficiency,and transparency across intermediaries.Conventional infrastructure financing instruments were compared with blockchain tokenization by reviewing the literature on infrastructure finance.The benefits and barriers of tokenizing infrastructure assets were thoroughly discussed to devise ways of improving infrastructure financing.The study also found that the potential of tokenization has not yet been fully realized because of the limited technical infrastructures,regulation uncertainties,volatilities in the token market,and absence of the public sector.This study contributes to the present understanding of how blockchain technology can be implemented in infrastructure finance and the role of tokenization in the structure of public-private partnership and project finance.
基金supported by the National Key Research and Development Program of China(Project No.2022YFC3320800)the National Natural Science Foundation of China(Project No.72571210).
文摘Nonfungible tokens(NFTs)have become highly sought-after assets in recent years,exhibiting potential for profitability and hedging.The large and lucrative NFT market has attracted both practitioners and researchers to develop NFT price-prediction models.However,the extant models have some weaknesses in terms of model comprehensiveness and operational convenience.To address these research gaps,we propose a multimodal end-to-end interpretable deep learning(MEID)framework for NFT investment.Our model integrates visual features,textual descriptions,transaction indicators,and historical price time series by leveraging the advantages of convolutional neural networks(CNNs),adopts integrated gradient(IG)to improve interpretability,and designs a built-in financial evaluation mechanism to generate not only the predicted price category but also the recommended purchase level.The experimental results demonstrate that the proposed MEID framework has excellent properties in terms of the evaluation metrics.The proposed MEID framework could help investors identify market opportunities and help NFT transaction platforms design smart investment tools and improve transaction volume.