Managing Product Page Hubs
The ProductPageHub class implements the asynchronous context manager
protocol and provides methods for updating pages, subscribing to events in bulk,
and data serialization.
Creating Product Page Hubs
Instantiating ProductPageHub does not automatically establish a client
session. The session management and data updates are explicitly handled through
provided methods.
import asyncio
from freshpointsync import ProductPageHub
async def main() -> None:
print('Initializing hub session...')
hub = ProductPageHub()
try:
await hub.start_session()
print('Adding pages to the hub...')
await hub.new_page(location_id=122, fetch_contents=True)
await hub.new_page(location_id=296, fetch_contents=True)
pages = ', '.join(
f"'{page.data.location}' (ID={page_id})"
for page_id, page in hub.pages.items()
)
print(f'Successfully added pages {pages} to the hub!')
finally:
print('Closing hub session...')
await hub.close_session()
if __name__ == '__main__':
asyncio.run(main())
In the example above, a new ProductPageHub instance is created. The session
is started manually, and the page with ID 296 is added. Lastly, the session is
closed in the finally block to ensure proper cleanup.
The same can be achieved using the asynchronous context manager.
import asyncio
from freshpointsync import ProductPageHub
async def main() -> None:
print('Initializing hub session...')
async with ProductPageHub() as hub:
await hub.start_session()
print('Adding pages to the hub...')
await hub.new_page(location_id=122, fetch_contents=True)
await hub.new_page(location_id=296, fetch_contents=True)
pages = ', '.join(
f"'{page.data.location}' (ID={page_id})"
for page_id, page in hub.pages.items()
)
print(f'Successfully added pages {pages} to the hub!')
if __name__ == '__main__':
asyncio.run(main())
The example above is equivalent to the previous one. The client session is automatically created and subsequently closed when the context manager exits.
Registering Pages in the Hub
The hub can be populated with product pages. Once a page is added to the hub,
it receives a common client session and task runner. It is also subscribed to
the hub’s events, and its context is populated with a common top-level
hub context data.
Note
If a certain key is present in both the hub and page context, the hub’s value takes precedence and overwrites the page’s value.
Creating New Pages
A straightforward way to add a new page to the hub is by using the new_page
method.
await hub.new_page(
location_id=296,
fetch_contents=True,
trigger_handlers=False,
)
The example above adds a new page with ID 296 to the hub. The page data is fetched. The common registered event handlers are not triggered during the data update.
Adding Existing Pages
An existing page can be added to the hub by using the add_page method.
# ... assuming the page object is already created
await hub.add_page(
page=page,
update_contents=False,
trigger_handlers=False,
)
The example above adds an existing page to the hub. The page data is not updated. The common registered event handlers are not triggered.
Scanning for Pages
The hub can automatically search for pages within a specified location ID range.
The scan method is used for this purpose. The signature of the method is
similar to the built-in range function. However, the stop parameter is
inclusive.
await hub.scan(start=10, stop=20)
The example above scans for pages with IDs from 10 to 20. The step parameter specifies the increment value between the IDs.
Note
The scan method execution depends on the ID range and the chosen
processing strategy. The larger the range, the longer the execution time.
Initial scanning with a default ID range of 1 to 1000 with a step of 1 may
take up to 10 minutes.
Accessing Pages
The pages in the hub can be accessed using the read-only pages attribute.
This attribute is a dictionary where the keys are the page IDs, and the values
are the corresponding page objects.
page = hub.pages.get(296)
The example above retrieves the page with ID 296 from the hub.
Removing Pages
A page can be removed from the hub by using the remove_page method.
A removed page receives a new client without an initialized session.
await hub.remove_page(296)
The example above removes the page with ID 296 from the hub.
Serializing Hub Data
The hub data is represented by a ProductPageHubData object, which is
a Pydantic model. It can be serialized and stored between application sessions.
import asyncio
from freshpointsync import ProductPageHub, ProductPageHubData
CACHE_FILE = 'hubData.json'
def dump_to_file(data: ProductPageHubData, file_path: str) -> None:
print(f"Dumping data to cache file '{file_path}'...")
with open(file_path, 'w', encoding='utf-8') as f:
f.write(data.model_dump_json(indent=4, by_alias=True))
async def main() -> None:
print('Initializing hub session...')
async with ProductPageHub(enable_multiprocessing=True) as hub:
print('Searching for pages in range 10 to 20...')
await hub.scan(start=10, stop=20)
print('Dumping hub data to file...')
dump_to_file(hub.data, CACHE_FILE)
if __name__ == '__main__':
asyncio.run(main())
In the example above, the hub scans for pages with IDs from 10 to 20.
The resulting page data is dumped to a JSON file. The data can be loaded back
into the hub by providing a ProductPageHubData object to the constructor.
The enable_multiprocessing parameter in the ProductPageHub constructor
is used to enable multiprocessing for the hub. When enabled, the hub will use
multiple processes to parse the fetched product page data. On one hand, this
can significantly speed up the data retrieval process. On the other hand,
Python’s multiprocessing module has some limitations and should be used
with caution. See concurrent.futures documentation for more
information.
Note
The full dumped JSON data for every existing page may take up to 80 MB of
disk space. You can exclude specific fields from serialization by providing
the exclude parameter to the model_dump and model_dump_json
methods. For example, to exclude the product descriptions from the dumped
data, you can use the following syntax:
data = hub.data.model_dump(
exclude={'pages': {'__all__': {'products': {'__all__': {'info'}}}}}
)