big data analytics https://statistics.sitemasonry.gmu.edu/ en New scalable computing technique will make analyzing Big Data easier  https://statistics.sitemasonry.gmu.edu/news/2024-09/new-scalable-computing-technique-will-make-analyzing-big-data-easier <span>New scalable computing technique will make analyzing Big Data easier </span> <span><span lang="" about="/user/541" typeof="schema:Person" property="schema:name" datatype="">Teresa Donnellan</span></span> <span>Tue, 09/17/2024 - 16:23</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class='field__items'> <div class="field__item"><a href="/profiles/lwang41" hreflang="en">Lily Wang</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p><span class="intro-text">With the advancement of data collection techniques, there has been an exponential increase in the availability and complexity of datasets, particularly spatiotemporal data; finding the computing power to analyze such Big Data, however, has remained a challenge for many researchers in various fields. Through a collaborative research project funded by the National Science Foundation, George Mason University statistics professor <a href="https://www.gmu.edu/profiles/lwang41">Lily Wang</a> hopes to change that.  </span></p> <figure role="group" class="align-right"><div> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq241/files/styles/small_content_image/public/2024-09/lily_wang_500x500.png?itok=Ydm1wljU" width="350" height="350" alt="Lily Wang, Professor, Statistics, College of Engineering and Computing. Photo by Creative Services" loading="lazy" typeof="foaf:Image" /></div> </div> <figcaption>Professor Lily Wang, Department of Statistics, College of Engineering and Computing. Photo by Creative Services</figcaption></figure><p>Wang and the Chair of the Department of Statistics at The George Washington University, <a href="https://statistics.columbian.gwu.edu/huixia-wang">Huixia Judy Wang</a>, are developing a form of scalable, distributed computing that could lessen the power demand on any single computer by distributing the analysis across a network of computers.  </p> <p>“In the past, we knew there were insights hidden in the data, but due to computing limitations, we couldn’t access them,” said Lily Wang. “Now, with scalable quantile learning techniques, we can gain a deeper understanding of the entire data distribution and extract insights into variability, outliers, and tail behavior, which are critical for more informed decision-making.” </p> <p>Spatial and temporal data are increasingly being used in such research areas as climate study and health care, among others, noted Lily Wang. </p> <p>“This data richness presents a lot of opportunities for getting deep insights into dynamic patterns over time and space; but it also brings many, many challenges,” said Wang. Large datasets often exhibit heterogeneous and dynamic patterns, requiring new approaches to capture meaningful relationships. </p> <p>This project uses two large datasets: the National Environmental Public Health Tracking Network database from the Centers for Disease Control and Prevention and the outdoor air quality data repository from the Environmental Protection Agency. </p> <p>“Both datasets have been challenging to analyze in the past due to their size and complexity,” explained Wang. “But through scalable and distributed learning techniques, we’re now able to handle large-scale heterogeneous data across the entire United States.” </p> <p>One of the project's major innovations is the use of distributed computing to divide the data into smaller, manageable regions. Each region is analyzed separately, and the results are efficiently aggregated to form a comprehensive understanding of the entire dataset.  </p> <p>“You can think of it like dividing the U.S. into small regions, analyzing each one separately, and then combining the results to create a comprehensive national analysis,” Wang said. “This method allows us to analyze millions of data points simultaneously without the need for supercomputers.” </p> <p>Beyond its goals for technical advancements, the project also emphasizes training the next generation of data scientists. Graduate students at George Mason and The George Washington will gain hands-on experience working with real-world data, helping to develop new computational methods.  </p> <p>The project began on September 1, 2024, and is expected to last three years. It has already garnered attention, including recognition from the office of Congressman Gerry Connolly (D-VA). </p> <p>The potential applications of this research are far-reaching, from improving air quality predictions to understanding public health trends and beyond. Wang explained, "This work empowers researchers and policymakers to leverage vast amounts of data to address rising societal issues more effectively.” </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class='field__items'> <div class="field__item"><a href="/taxonomy/term/791" hreflang="en">Department of Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/836" hreflang="en">Statistics Faculty</a></div> <div class="field__item"><a href="/taxonomy/term/756" hreflang="en">Computational statistics</a></div> <div class="field__item"><a href="/taxonomy/term/736" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/1311" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/1371" hreflang="en">Research Interests: Nonstationary Time Series Analysis; Spectral Analysis; Nonparametric Statistics; Big Data; Bayesian Data Analysis; Applications in Medicine</a></div> <div class="field__item"><a href="/taxonomy/term/86" hreflang="en">Research</a></div> </div> </div> </div> </div> </div> Tue, 17 Sep 2024 20:23:22 +0000 Teresa Donnellan 1596 at https://statistics.sitemasonry.gmu.edu Professor applies statistics and AI to land use modeling and real estate pricing  https://statistics.sitemasonry.gmu.edu/news/2024-05/professor-applies-statistics-and-ai-land-use-modeling-and-real-estate-pricing <span>Professor applies statistics and AI to land use modeling and real estate pricing </span> <span><span lang="" about="/user/541" typeof="schema:Person" property="schema:name" datatype="">Teresa Donnellan</span></span> <span>Wed, 05/29/2024 - 12:18</span> <div class="layout layout--gmu layout--twocol-section layout--twocol-section--30-70"> <div class="layout__region region-first"> <div data-block-plugin-id="field_block:node:news_release:field_associated_people" class="block block-layout-builder block-field-blocknodenews-releasefield-associated-people"> <h2>In This Story</h2> <div class="field field--name-field-associated-people field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">People Mentioned in This Story</div> <div class='field__items'> <div class="field__item"><a href="/profiles/asafikha" hreflang="en">Abolfazl Safikhani</a></div> </div> </div> </div> </div> <div class="layout__region region-second"> <div data-block-plugin-id="field_block:node:news_release:body" class="block block-layout-builder block-field-blocknodenews-releasebody"> <div class="field field--name-body field--type-text-with-summary field--label-visually_hidden"> <div class="field__label visually-hidden">Body</div> <div class="field__item"><p><span class="intro-text">George Mason University statistics professor Abolfazl Safikhani recently applied his cutting-edge, interdisciplinary research to analyzing land use dynamics and property pricing shifts over time, work that underscores the transformative potential of data-driven insights, especially in urban planning and real estate. </span></p> <p>Safikhani earned bachelor’s and master’s degrees in mathematics before earning a doctorate in statistics. </p> <p>“I decided to do a PhD in statistics because throughout the master’s I had become more and more interested in connecting real world problems to data. And I'm very happy that I made that decision,” he said. </p> <figure role="group" class="align-right"><div> <div class="field field--name-image field--type-image field--label-hidden field__item"> <img src="/sites/g/files/yyqcgq241/files/styles/small_content_image/public/2024-05/resize_image_project-1.png?itok=MGREu4F3" width="350" height="350" alt="Abolfazl Safikhani" loading="lazy" typeof="foaf:Image" /></div> </div> <figcaption>Abolfazl Safikhani</figcaption></figure><p>Along with a former colleague at the University of Florida in the urban planning department, Safikhani applied machine learning techniques to a dataset comprising millions of land parcels in Florida. The two endeavored to decipher the intricate dynamics of land use transformations over time and predict future developments with unprecedented accuracy. Their predictions surpassed 98% accuracy. </p> <p>But the team didn't stop with successful predictions. They recognized the importance of understanding the underlying mechanisms driving these predictions. With the addition of a new collaborator, Tianshu Feng in George Mason’s Systems Engineering and Operations Research Department, the researchers aim to present their land use analysis software as explainable artificial intelligence (XAI). By elucidating the black box of machine learning algorithms, Safikhani hopes local government decision-makers and urban planners can confidently leverage the software to optimize resource allocation effectively. </p> <p>Another of Safikhani’s projects considers land use and value specifically concerning the price of residential real estate. Safikhani’s own experience buying real estate in Fairfax County, Virginia, in 2022, inspired this project. When he asked his real estate agent to estimate a fair price of a certain house, the agent came back with an estimate based on the price of three comparable local properties that had recently sold. Ever a “quant guy,” Safikhani said, he thought there could be a better way: applying the idea of transfer learning. </p> <p>“The big idea of transfer learning is, within your big data set, try to find areas that have similar dynamics to your area of interest. And then use that similarity to improve your prediction,” Safikhani explained. “So, imagine that there is a little neighborhood somewhere in DC or somewhere in Maryland or somewhere in California that has dynamics very similar to the specific neighborhood where you want to buy a house in Northern Virginia. Once you account for some changes, let's say, regulations and things that are different, then the remaining dynamics are their similarities.” </p> <p>He continued, “If you only use your neighborhood, you can have three data points. If you use another, similar neighborhood, it's going to be 20. If you use neighborhoods from other places over the 50 states of the U.S., you may end up getting a thousand data points.” </p> <p>Safikhani is working with a colleague from the University of California – Los Angeles to bring in funding to develop this pricing software. Their preliminary results show the benefit of their proposed model versus current pricing systems.  </p> <p>Safikhani's research is poised to revolutionize sectors like urban planning and real estate. In fact, his research has attracted the attention of startups keen to translate his findings into real estate–disrupting tools. </p> <p>“It seems there's actually a growing interest in having such AI tools that would understand land use development and then really match it with pricing,” he said. “And sooner or later, this [technology] is going to come out. Platforms like Zillow are doing a good job, but there's much more that can be done.” </p> </div> </div> </div> <div data-block-plugin-id="field_block:node:news_release:field_content_topics" class="block block-layout-builder block-field-blocknodenews-releasefield-content-topics"> <h2>Topics</h2> <div class="field field--name-field-content-topics field--type-entity-reference field--label-visually_hidden"> <div class="field__label visually-hidden">Topics</div> <div class='field__items'> <div class="field__item"><a href="/taxonomy/term/1211" hreflang="en">Applied Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/791" hreflang="en">Department of Statistics</a></div> <div class="field__item"><a href="/taxonomy/term/836" hreflang="en">Statistics Faculty</a></div> <div class="field__item"><a href="/taxonomy/term/756" hreflang="en">Computational statistics</a></div> <div class="field__item"><a href="/taxonomy/term/736" hreflang="en">Big Data</a></div> <div class="field__item"><a href="/taxonomy/term/1311" hreflang="en">big data analytics</a></div> <div class="field__item"><a href="/taxonomy/term/1316" hreflang="en">real estate entrepreneurship</a></div> <div class="field__item"><a href="/taxonomy/term/271" hreflang="en">Artificial Intelligence</a></div> <div class="field__item"><a href="/taxonomy/term/286" hreflang="en">AI</a></div> <div class="field__item"><a href="/taxonomy/term/86" hreflang="en">Research</a></div> </div> </div> </div> </div> </div> Wed, 29 May 2024 16:18:12 +0000 Teresa Donnellan 1546 at https://statistics.sitemasonry.gmu.edu