Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspe...Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspects of the items thus leading to more sophisticated and justifiable recommendations. However, most Collaborative Filtering (CF) techniques rely mainly on the overall preferences of users toward items only. And there is lack of conceptual and computational framework that enables an understandable aspect-based AI approach to recommending items to users. In this paper, we propose concepts and computational tools that can sharpen the logic of recommendations and that rely on users’ sentiments along various aspects of items. These concepts include: The sentiment of a user towards a specific aspect of a specific item, the emphasis that a given user places on a specific aspect in general, the popularity and controversy of an aspect among groups of users, clusters of users emphasizing a given aspect, clusters of items that are popular among a group of users and so forth. The framework introduced in this study is developed in terms of user emphasis, aspect popularity, aspect controversy, and users and items similarity. Towards this end, we introduce the Aspect-Based Collaborative Filtering Toolbox (ABCFT), where the tools are all developed based on the three-index sentiment tensor with the indices being the user, item, and aspect. The toolbox computes solutions to the questions alluded to above. We illustrate the methodology using a hotel review dataset having around 6000 users, 400 hotels and 6 aspects.展开更多
Incremental Newton(IN) iteration, proposed by Iannazzo, is stable for computing the matrix pth root, and its computational cost is O(n-3p) flops per iteration. In this paper, a cost-efficient variant of IN iterati...Incremental Newton(IN) iteration, proposed by Iannazzo, is stable for computing the matrix pth root, and its computational cost is O(n-3p) flops per iteration. In this paper, a cost-efficient variant of IN iteration is presented. The computational cost of the variant well agrees with O(n-3logp) flops per iteration, if p is up to at least 100.展开更多
This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing un...This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing unwanted or non-utilizable solid materials, commonly known as rubbish, trash, junk, refuse, and garbage. These stages include generation, storage, collection, recycling, transportation, handling, disposal, and monitoring. The waste materials mentioned in this context exhibit a wide range of items, such as organic waste from food and vegetables, paper, plastic, polyethylene, iron, tin cans, deceased animals, byproducts from demolition activities, manure, and various other discarded materials. This study aims to provide insights into the possibilities of enhancing solid waste management in the Farmgate area of Dhaka North City Corporation (DNCC). To accomplish this objective, the research examines the conventional waste management methods employed in this area. It conducts extensive field surveys, collecting valuable data through interviews with local residents and key individuals involved in waste management, such as waste collectors, dealers, intermediate dealers, recyclers, and shopkeepers. The results indicate that significant amounts of distinct waste categories are produced daily. These include food and vegetable waste, which amount to 52.1 tons/day;polythene and plastic, which total 4.5 tons/day;metal and tin-can waste, which amounts to 1.4 tons/day;and paper waste, which totals 5.9 tons/day. This study highlights the significance of promoting environmental consciousness to effectively shape the attitudes of urban residents toward waste disposal and management. It emphasizes the need for collaboration between authorities and researchers to improve the current waste management system.展开更多
A dense discrete phase model combined with the kinetic theory of granular flows was used to study the bubbling characteristics and segregation of poly-dispersed particle mixtures in a thin fluidized bed.Our simulation...A dense discrete phase model combined with the kinetic theory of granular flows was used to study the bubbling characteristics and segregation of poly-dispersed particle mixtures in a thin fluidized bed.Our simulations showed that in using the hybrid Eulerian-Lagrangian method,the common use of one computational cell in the thickness direction of the thin bed does not predict wall friction correctly.Instead,a three-cell discretization of the thickness direction does predict the wall friction well but six cells were needed to prevent overprediction of the bed expansion.The change in specularity factor(SF)of the model not only affected the predictions of the velocity of particles,but also had a considerable impact on their flow pattern.A decrease in SF,which decreases wall friction,showed an over-prediction in the size of bubbles,particle velocities,and void fraction of the bed,and led to a shift in the circulation center toward the bottom of the bed.The segregation of the Geldart B particles was studied in the narrow range from 400 to 600μm with a standard deviation less than 10%of the average diameter.Simulations showed that large particles accumulated close to the distributor at the bottom of the bed and the center of the bed,but small particles moved towards the wall and top surface.The decrease in the mean particle size and spread in shape of the distribution improves mixing by up to 30%at a superficial gas velocity of around 2.5 times the minimum fluidization velocity.Log-normal mixtures with a small proportion of large particles had the most uniform distribution with a thin layer of jetsam forming at the bottom of the bed.Finally,experimental verification of the segregation and mixing of polydisperse particles with narrow size distribution is suggested.展开更多
Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understo...Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understood,and clear guidelines for selecting optimal or even appropriate subsampling levels are not available.In this paper,we present a Density-based Random Stratified Subsampling using Clustering(DRSC)algorithm in which the desired Fraction of Users Dropped(FUD)and Fraction of Items Dropped(FID)are specified,and the overall density during subsampling is maintained.Subsequently,we develop simple models of the Training Time Improvement(TTI)and the Accuracy Loss(AL)as functions of FUD and FID,based on extensive simulations of seven standard CF algorithms as applied to various primary matrices from MovieLens,Yahoo Music Rating,and Amazon Automotive data.Simulations show that both TTI and a scaled AL are bi-linear in FID and FUD for all seven methods.The TTI linear regression of a CF method appears to be same for all datasets.Extensive simulations illustrate that TTI can be estimated reliably with FUD and FID only,but AL requires considering additional dataset characteristics.The derived models are then used to optimize the levels of subsampling addressing the tradeoff between TTI and AL.A simple sub-optimal approximation was found,in which the optimal AL is proportional to the optimal Training Time Reduction Factor(TTRF)for higher values of TTRF,and the optimal subsampling levels,like optimal FID/(1-FID),are proportional to the square root of TTRF.展开更多
We consider the numerical integration of the Hunter-Saxton equation, which models the propagation of weakly nonlinear orientation waves. For the equation, we present two weak forms and their Galerkin discretizations. ...We consider the numerical integration of the Hunter-Saxton equation, which models the propagation of weakly nonlinear orientation waves. For the equation, we present two weak forms and their Galerkin discretizations. The Galerkin schemes preserve the Hamiltonian of the equation and can be implemented with cheap H^1 elements. Numerical experiments confirm the effectiveness of the schemes.展开更多
While log law is an equation theoretically derived for near-bed region, in most cases, power law has been researched by experimental methods. Thus, many consider it as an empirical equation and fixed power law exponen...While log law is an equation theoretically derived for near-bed region, in most cases, power law has been researched by experimental methods. Thus, many consider it as an empirical equation and fixed power law exponents such as 1/6 and 1/7 are generally applied. However, exponent of power law is an index representing bed resistance related with relative roughness and furthermore influences the shapes of vertical velocity distribution. The purpose of this study is to investigate characteristics of vertical velocity distribution of the natural rivers by testing and optimizing previous methods used for determination of power law exponent with vertical velocity distribution data collected with ADCPs during the years of 2005 to 2009 from rivers in South Korea. Roughness coefficient has been calculated from the equation of Limerinos. And using theoretical and empirical formulae, and representing relationships between bed resistance and power law exponent, it has been evaluated whether the exponents suggested by these equations appropriately reproduce vertical velocity distribution of actual rivers. As a result, it has been confirmed that there is an increasing trend of power law exponent as bed resistance increases. Therefore, in order to correctly predict vertical velocity distribution in the natural rivers, it is necessary to use an exponent that reflects flow conditions at the field.展开更多
In the wake of the rapid surge in the COVID-19-infected cases seen in Southern and West-Central USA in the period of June-July 2020,there is an urgent need to develop robust,data-driven models to quantify the effect w...In the wake of the rapid surge in the COVID-19-infected cases seen in Southern and West-Central USA in the period of June-July 2020,there is an urgent need to develop robust,data-driven models to quantify the effect which early reopening had on the infected case count increase.In particular,it is imperative to address the question:How many infected cases could have been prevented,had the worst affected states not reopened early?To address this question,we have developed a novel COVID-19 model by augmenting the classical SIR epidemiological model with a neural network module.The model decomposes the contribution of quarantine strength to the infection time series,allowing us to quantify the role of quarantine control and the associated reopening policies in the US states which showed a major surge in infections.We show that the upsurge in the infected cases seen in these states is strongly corelated with a drop in the quarantine/lockdown strength diagnosed by our model.Further,our results demonstrate that in the event of a stricter lockdown without early reopening,the number of active infected cases recorded on 14 July could have been reduced by more than 40%in all states considered,with the actual number of infections reduced being more than 100,000 for the states of Florida and Texas.As we continue our fight against COVID-19,our proposed model can be used as a valuable asset to simulate the effect of several reopening strategies on the infected count evolution,for any region under consideration.展开更多
The elementary analysis of this paper presents explicit expressions of the constants in the a priori error estimates for the lowest-order Courant, Crouzeix-Raviart nonconforming and P^viart-Thomas mixed finite element...The elementary analysis of this paper presents explicit expressions of the constants in the a priori error estimates for the lowest-order Courant, Crouzeix-Raviart nonconforming and P^viart-Thomas mixed finite element methods in the Poisson model problem. The three constants and their dependences on some maximal angle in the triangulation are indeed all comparable and allow accurate a priori error control.展开更多
We postulate and analyze a nonlinear subsampling accuracy loss(SSAL)model based on the root mean square error(RMSE)and two SSAL models based on the mean square error(MSE),suggested by extensive preliminary simulations...We postulate and analyze a nonlinear subsampling accuracy loss(SSAL)model based on the root mean square error(RMSE)and two SSAL models based on the mean square error(MSE),suggested by extensive preliminary simulations.The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped(FUD)and the fraction of items dropped(FID).We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering(CF)algorithm.The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items.Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics.The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user(or an item)vs.not dropping it.Moreover,one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient.Most importantly,the models are constant in the sense that they are written in closed-form using the considered data characteristics(densities and numbers of users and items).The models are validated through extensive simulations based on 850 synthetically generated primary(pre-subsampling)matrices derived from the 25M MovieLens dataset.Nearly 460000 subsampled rating matrices were then simulated and subjected to the singular value decomposition(SVD)CF algorithm.Further validation was conducted using the 1M MovieLens and the Yahoo!Music Rating datasets.The models were constant and significant across all 3 datasets.展开更多
We compare 13 different a posteriori error estimators for the Poisson problem with lowest-order finite element discretization. Residual-based error estimators compete with a wide range of averaging estimators and esti...We compare 13 different a posteriori error estimators for the Poisson problem with lowest-order finite element discretization. Residual-based error estimators compete with a wide range of averaging estimators and estimators based on local problems. Among our five benchmark problems we also look on two examples with discontinuous isotropic diffusion and their impact on the performance of the estimators. (Supported by DFG Research Center MATHEON.)展开更多
High-attenuation object-induced streaking and shadow artifacts in computerized to- mography (CT) are somewhat connected to the misfit of the X-ray projection data to the range space of the Radon transform. This misf...High-attenuation object-induced streaking and shadow artifacts in computerized to- mography (CT) are somewhat connected to the misfit of the X-ray projection data to the range space of the Radon transform. This misfit is mainly due to the beam hardening factor of the projection data which is unavoidable for polychromatic sources. The major difficulty in dealing with the beam hardening-induced streaking and shadow artifacts comes from its highly nonlinear nature depending on geometries of high attenuation objects. In this work, we investigate the mathematical characteristics of those streaking and shadow artifacts from the structure of the projection data. We also proposed a metal artifacts reduction method by incorporating the recent technique of the nonlinear beam-hardening corrector. Numerical simulations show that the proposed method effectively alleviates the streaking artifacts without changing the background images.展开更多
文摘Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspects of the items thus leading to more sophisticated and justifiable recommendations. However, most Collaborative Filtering (CF) techniques rely mainly on the overall preferences of users toward items only. And there is lack of conceptual and computational framework that enables an understandable aspect-based AI approach to recommending items to users. In this paper, we propose concepts and computational tools that can sharpen the logic of recommendations and that rely on users’ sentiments along various aspects of items. These concepts include: The sentiment of a user towards a specific aspect of a specific item, the emphasis that a given user places on a specific aspect in general, the popularity and controversy of an aspect among groups of users, clusters of users emphasizing a given aspect, clusters of items that are popular among a group of users and so forth. The framework introduced in this study is developed in terms of user emphasis, aspect popularity, aspect controversy, and users and items similarity. Towards this end, we introduce the Aspect-Based Collaborative Filtering Toolbox (ABCFT), where the tools are all developed based on the three-index sentiment tensor with the indices being the user, item, and aspect. The toolbox computes solutions to the questions alluded to above. We illustrate the methodology using a hotel review dataset having around 6000 users, 400 hotels and 6 aspects.
文摘Incremental Newton(IN) iteration, proposed by Iannazzo, is stable for computing the matrix pth root, and its computational cost is O(n-3p) flops per iteration. In this paper, a cost-efficient variant of IN iteration is presented. The computational cost of the variant well agrees with O(n-3logp) flops per iteration, if p is up to at least 100.
文摘This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing unwanted or non-utilizable solid materials, commonly known as rubbish, trash, junk, refuse, and garbage. These stages include generation, storage, collection, recycling, transportation, handling, disposal, and monitoring. The waste materials mentioned in this context exhibit a wide range of items, such as organic waste from food and vegetables, paper, plastic, polyethylene, iron, tin cans, deceased animals, byproducts from demolition activities, manure, and various other discarded materials. This study aims to provide insights into the possibilities of enhancing solid waste management in the Farmgate area of Dhaka North City Corporation (DNCC). To accomplish this objective, the research examines the conventional waste management methods employed in this area. It conducts extensive field surveys, collecting valuable data through interviews with local residents and key individuals involved in waste management, such as waste collectors, dealers, intermediate dealers, recyclers, and shopkeepers. The results indicate that significant amounts of distinct waste categories are produced daily. These include food and vegetable waste, which amount to 52.1 tons/day;polythene and plastic, which total 4.5 tons/day;metal and tin-can waste, which amounts to 1.4 tons/day;and paper waste, which totals 5.9 tons/day. This study highlights the significance of promoting environmental consciousness to effectively shape the attitudes of urban residents toward waste disposal and management. It emphasizes the need for collaboration between authorities and researchers to improve the current waste management system.
基金We thank Dr.David Dayton at RTI International for his help and valuable comments.We acknowledge a contribution from North Carolina Agricultural and Technical State University,supported by funds partially provided by U.S.Department of Energy(Grant#:EE0003138)U.S.National Scientific Foundation(Grant#:HRD-1242152).Mention of a trade name,proprietary products or company name is for presentation clarity and does not imply endorsement by the authors or the university.
文摘A dense discrete phase model combined with the kinetic theory of granular flows was used to study the bubbling characteristics and segregation of poly-dispersed particle mixtures in a thin fluidized bed.Our simulations showed that in using the hybrid Eulerian-Lagrangian method,the common use of one computational cell in the thickness direction of the thin bed does not predict wall friction correctly.Instead,a three-cell discretization of the thickness direction does predict the wall friction well but six cells were needed to prevent overprediction of the bed expansion.The change in specularity factor(SF)of the model not only affected the predictions of the velocity of particles,but also had a considerable impact on their flow pattern.A decrease in SF,which decreases wall friction,showed an over-prediction in the size of bubbles,particle velocities,and void fraction of the bed,and led to a shift in the circulation center toward the bottom of the bed.The segregation of the Geldart B particles was studied in the narrow range from 400 to 600μm with a standard deviation less than 10%of the average diameter.Simulations showed that large particles accumulated close to the distributor at the bottom of the bed and the center of the bed,but small particles moved towards the wall and top surface.The decrease in the mean particle size and spread in shape of the distribution improves mixing by up to 30%at a superficial gas velocity of around 2.5 times the minimum fluidization velocity.Log-normal mixtures with a small proportion of large particles had the most uniform distribution with a thin layer of jetsam forming at the bottom of the bed.Finally,experimental verification of the segregation and mixing of polydisperse particles with narrow size distribution is suggested.
文摘Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understood,and clear guidelines for selecting optimal or even appropriate subsampling levels are not available.In this paper,we present a Density-based Random Stratified Subsampling using Clustering(DRSC)algorithm in which the desired Fraction of Users Dropped(FUD)and Fraction of Items Dropped(FID)are specified,and the overall density during subsampling is maintained.Subsequently,we develop simple models of the Training Time Improvement(TTI)and the Accuracy Loss(AL)as functions of FUD and FID,based on extensive simulations of seven standard CF algorithms as applied to various primary matrices from MovieLens,Yahoo Music Rating,and Amazon Automotive data.Simulations show that both TTI and a scaled AL are bi-linear in FID and FUD for all seven methods.The TTI linear regression of a CF method appears to be same for all datasets.Extensive simulations illustrate that TTI can be estimated reliably with FUD and FID only,but AL requires considering additional dataset characteristics.The derived models are then used to optimize the levels of subsampling addressing the tradeoff between TTI and AL.A simple sub-optimal approximation was found,in which the optimal AL is proportional to the optimal Training Time Reduction Factor(TTRF)for higher values of TTRF,and the optimal subsampling levels,like optimal FID/(1-FID),are proportional to the square root of TTRF.
文摘We consider the numerical integration of the Hunter-Saxton equation, which models the propagation of weakly nonlinear orientation waves. For the equation, we present two weak forms and their Galerkin discretizations. The Galerkin schemes preserve the Hamiltonian of the equation and can be implemented with cheap H^1 elements. Numerical experiments confirm the effectiveness of the schemes.
文摘While log law is an equation theoretically derived for near-bed region, in most cases, power law has been researched by experimental methods. Thus, many consider it as an empirical equation and fixed power law exponents such as 1/6 and 1/7 are generally applied. However, exponent of power law is an index representing bed resistance related with relative roughness and furthermore influences the shapes of vertical velocity distribution. The purpose of this study is to investigate characteristics of vertical velocity distribution of the natural rivers by testing and optimizing previous methods used for determination of power law exponent with vertical velocity distribution data collected with ADCPs during the years of 2005 to 2009 from rivers in South Korea. Roughness coefficient has been calculated from the equation of Limerinos. And using theoretical and empirical formulae, and representing relationships between bed resistance and power law exponent, it has been evaluated whether the exponents suggested by these equations appropriately reproduce vertical velocity distribution of actual rivers. As a result, it has been confirmed that there is an increasing trend of power law exponent as bed resistance increases. Therefore, in order to correctly predict vertical velocity distribution in the natural rivers, it is necessary to use an exponent that reflects flow conditions at the field.
基金This effort was partially funded by the Intelligence Advanced Research Projects Activity(IARPA).We are grateful to Haluk Akay,Hyungseok Kim,and Wujie Wang for helpful discussions and suggestions.
文摘In the wake of the rapid surge in the COVID-19-infected cases seen in Southern and West-Central USA in the period of June-July 2020,there is an urgent need to develop robust,data-driven models to quantify the effect which early reopening had on the infected case count increase.In particular,it is imperative to address the question:How many infected cases could have been prevented,had the worst affected states not reopened early?To address this question,we have developed a novel COVID-19 model by augmenting the classical SIR epidemiological model with a neural network module.The model decomposes the contribution of quarantine strength to the infection time series,allowing us to quantify the role of quarantine control and the associated reopening policies in the US states which showed a major surge in infections.We show that the upsurge in the infected cases seen in these states is strongly corelated with a drop in the quarantine/lockdown strength diagnosed by our model.Further,our results demonstrate that in the event of a stricter lockdown without early reopening,the number of active infected cases recorded on 14 July could have been reduced by more than 40%in all states considered,with the actual number of infections reduced being more than 100,000 for the states of Florida and Texas.As we continue our fight against COVID-19,our proposed model can be used as a valuable asset to simulate the effect of several reopening strategies on the infected count evolution,for any region under consideration.
文摘The elementary analysis of this paper presents explicit expressions of the constants in the a priori error estimates for the lowest-order Courant, Crouzeix-Raviart nonconforming and P^viart-Thomas mixed finite element methods in the Poisson model problem. The three constants and their dependences on some maximal angle in the triangulation are indeed all comparable and allow accurate a priori error control.
文摘We postulate and analyze a nonlinear subsampling accuracy loss(SSAL)model based on the root mean square error(RMSE)and two SSAL models based on the mean square error(MSE),suggested by extensive preliminary simulations.The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped(FUD)and the fraction of items dropped(FID).We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering(CF)algorithm.The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items.Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics.The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user(or an item)vs.not dropping it.Moreover,one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient.Most importantly,the models are constant in the sense that they are written in closed-form using the considered data characteristics(densities and numbers of users and items).The models are validated through extensive simulations based on 850 synthetically generated primary(pre-subsampling)matrices derived from the 25M MovieLens dataset.Nearly 460000 subsampled rating matrices were then simulated and subjected to the singular value decomposition(SVD)CF algorithm.Further validation was conducted using the 1M MovieLens and the Yahoo!Music Rating datasets.The models were constant and significant across all 3 datasets.
基金partly supported by the WCU program through KOSEF (R31-2008-000-10049-0)Supported by DFG Research Center MATHEON
文摘We compare 13 different a posteriori error estimators for the Poisson problem with lowest-order finite element discretization. Residual-based error estimators compete with a wide range of averaging estimators and estimators based on local problems. Among our five benchmark problems we also look on two examples with discontinuous isotropic diffusion and their impact on the performance of the estimators. (Supported by DFG Research Center MATHEON.)
文摘High-attenuation object-induced streaking and shadow artifacts in computerized to- mography (CT) are somewhat connected to the misfit of the X-ray projection data to the range space of the Radon transform. This misfit is mainly due to the beam hardening factor of the projection data which is unavoidable for polychromatic sources. The major difficulty in dealing with the beam hardening-induced streaking and shadow artifacts comes from its highly nonlinear nature depending on geometries of high attenuation objects. In this work, we investigate the mathematical characteristics of those streaking and shadow artifacts from the structure of the projection data. We also proposed a metal artifacts reduction method by incorporating the recent technique of the nonlinear beam-hardening corrector. Numerical simulations show that the proposed method effectively alleviates the streaking artifacts without changing the background images.