Processing math: 5%
We use cookies to improve your experience with our site.

ScenePalette: Contextually Exploring Object Collections Through Multiplex Relations in 3D Scenes

Shao-Kui Zhang, Wei-Yu Xie, Chen Wang, Song-Hai Zhang

downloadPDF
张少魁, 谢威宇, 王琛, 张松海. 场景调色板:根据上下文探索物体集合来设计三维场景[J]. 计算机科学技术学报, 2024, 39(5): 1180-1192. DOI: 10.1007/s11390-022-2194-6
引用本文: 张少魁, 谢威宇, 王琛, 张松海. 场景调色板:根据上下文探索物体集合来设计三维场景[J]. 计算机科学技术学报, 2024, 39(5): 1180-1192. DOI: 10.1007/s11390-022-2194-6
Zhang SK, Xie WY, Wang C et al. ScenePalette: Contextually exploring object collections through multiplex relations in 3D scenes. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(5): 1180−1192 Sept. 2024. DOI: 10.1007/s11390-022-2194-6.
Citation: Zhang SK, Xie WY, Wang C et al. ScenePalette: Contextually exploring object collections through multiplex relations in 3D scenes. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(5): 1180−1192 Sept. 2024. DOI: 10.1007/s11390-022-2194-6.
张少魁, 谢威宇, 王琛, 张松海. 场景调色板:根据上下文探索物体集合来设计三维场景[J]. 计算机科学技术学报, 2024, 39(5): 1180-1192. CSTR: 32374.14.s11390-022-2194-6
引用本文: 张少魁, 谢威宇, 王琛, 张松海. 场景调色板:根据上下文探索物体集合来设计三维场景[J]. 计算机科学技术学报, 2024, 39(5): 1180-1192. CSTR: 32374.14.s11390-022-2194-6
Zhang SK, Xie WY, Wang C et al. ScenePalette: Contextually exploring object collections through multiplex relations in 3D scenes. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(5): 1180−1192 Sept. 2024. CSTR: 32374.14.s11390-022-2194-6.
Citation: Zhang SK, Xie WY, Wang C et al. ScenePalette: Contextually exploring object collections through multiplex relations in 3D scenes. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 39(5): 1180−1192 Sept. 2024. CSTR: 32374.14.s11390-022-2194-6.

场景调色板:根据上下文探索物体集合来设计三维场景

ScenePalette: Contextually Exploring Object Collections Through Multiplex Relations in 3D Scenes

Funds: This work was supported by the National Natural Science Foundation of China under Grant No. 61832016, the Key Research Projects of the Foundation Strengthening Program of China under Grant No. 2020JCJQZD01412, and Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology.
More Information
    Author Bio:

    Shao-Kui Zhang received his PhD degree in computer science and technology from Tsinghua University, Beijing, in 2023. He is currently a postdoctoral researcher in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include computer graphics, 3D AIGC, and multimedia applications

    Wei-Yu Xie is a Ph.D. student in the Department of Computer Science and Technology at Tsinghua University, Beijing. He received his B.S. degree in computer science and technology from Beijing Institute of Technology, Beijing, in 2021. His research interests include computer architechture, operating systems, and computer graphics

    Chen Wang is a Master student in the Department of Computer Science and Technology at Tsinghua University, Beijing, from where he also received his B.S. degree in computer science and economics in 2020. His research interests include computer graphics and computer vision

    Song-Hai Zhang received his PhD degree in computer science and technology from Tsinghua University, Beijing, in 2007. He is currently an associate professor in the Department of Computer Science and Technology at Tsinghua University, Beijing. His research interests include computer graphics and virtual reality

    Corresponding author:

    Song-Hai Zhang: shz@tsinghua.edu.cn

  • 摘要:
    研究背景 

    现实世界的三维场景由上下文合理的物体组织组成,例如,人们通常将一张双人床而不是不同形状的床与几个从属物体放在卧室中。然而目前的工作都是根据三维物体的外形来探究他们的表示。

    目的 

    本文目的是研究如何高效的组织和表示三维物体集合。我们发现在三维场景中排布的物体,上下文的组织关系比视觉上的更重要。因此本文根据物体的上下文来构建物体的隐空间并由此允许用户通过交互生成三维场景。

    方法 

    我们抽取物体在三维场景中的不同层次的上下文关系,然后根据这些关系构建一个层级网络图。进一步的,我们将上下文相近的物体在隐空间中的距离拉近,上下文有差距的物体在隐空间的距离推远,以此来生成合理的三维物理隐空间。

    结果 

    我们发现根据上下文生成的三维物体隐空间和传统方法比起来能够更准确的建模物体间的关系。我们根据生成的三维物体表示构建了相关应用,例如高效地探索三维物体上下文,用户通过交互生成三维场景等。

    结论 

    本文证明了上下文构建三维物体隐空间的有效性。我们提出的方法在三维场景的应用中具有重要意义和潜力。同时,未来工作应该进一步探究依据人类直觉来设计三维场景表示。

    Abstract:

    This paper presents ScenePalette, a modeling tool that allows users to “draw” 3D scenes interactively by placing objects on a canvas based on their contextual relationship. ScenePalette is inspired by an important intuition which was often ignored in previous work: a real-world 3D scene consists of the contextually reasonable organization of objects, e.g. people typically place one double bed with several subordinate objects into a bedroom instead of different shapes of beds. ScenePalette, abstracts 3D repositories as multiplex networks and accordingly encodes implicit relations between or among objects. Specifically, basic statistics such as co-occurrence, in combination with advanced relations, are used to tackle object relationships of different levels. Extensive experiments demonstrate that the latent space of ScenePalette has rich contexts that are essential for contextual representation and exploration.

  • Existing work shows the benefits of leveraging contextual object datasets[1, 2] for applications and research on 3D models. Contextual object collections offer computer-aided design (CAD) models along with their arrangements (or layouts) in 3D scenes, providing spatial priors that facilitate automatic scene synthesis[3-5] and interactive scene synthesis[6, 7]. Handa et al.[8] synthesized indoor scene datasets from different views of 3D scenes, and Luo et al.[9] optimized layouts of objects given scene graphs with neural networks. Nevertheless, previous literature visualizes a collection of objects using their appearances, such as shapes, materials, or colors. [10, 11], instead of a contextual representation, as shown in Fig.1.

    Figure  1.  Latent contextual space. The three big circles refer to three different types of rooms. The small circles refer to different types of objects. An exploratory path exists from bedroom to living room and to kitchen, in which multiple objects are suitable for more than one type of room in the overlapped area.

    The design of 3D scenes is analogous to the painting procedure in that people add objects sequentially to the palette. In this paper, we propose ScenePalette to assist this process, which enables users to contextually explore and manipulate datasets of objects by encoding relations between (or among) objects. For example, given a double bed, our system automatically pops relevant objects around the mouse cursor, such as a nightstand. Our motivation comes from the observation that embedding objects with homogeneous relations to a latent space benefit these applications[12]. Therefore, we extract various potential contexts from the datasets into visual planes, which is associated with the problem of multiplex network embedding (MNE)[13].

    However, incorporating contexts into the collections of 3D shapes is inherently difficult. First, multiple potential contexts exist heterogeneously, e.g., co-occurrence simply counts the co-existing between objects, while tests for complete spatial randomness are measured typically in the Euclidean space. Second, there are implicit relations to be extracted, i.e., how two objects are indirectly correlated and how many “intermediaries” are implicit, e.g., a contextual relation between two visually similar office chairs concerning a desk. Third, a relation is likely to involve multiple objects[14]. Therefore multiple graphs are used instead of a simple graph.

    With the above difficulties, MNE[13] seems to be applicable, which can generate a consistent latent space given a graph with multiple heterogeneous edges between two vertices or a graph containing multiple heterogeneous layers. However, MNE assumes the completeness of datasets or at least the balance of datasets[15]. Our problem focuses on geometric datasets, and it is common to see two similar entities with totally different frequencies of usage[1, 5] and existence in rooms. Additionally, a typical 3D scene dataset contains a set of objects and their arrangements, e.g., bedrooms, while MNE extracts more “direct” relations between objects, such as relative transformations[5, 16, 17]. We also resort to DeepWalk[18], where a lack of edges pulls apart vertices in the vector space during embedding. Nevertheless, it is impossible to quantify the absence of edges for 3D object collections.

    Therefore, the core concern is how to utilize the basic object correlations in the dataset, e.g., co-occurrence, while considering the imbalance of data distribution. We still treat the object relation as a multiplex network where vertices represent 3D shapes and edges represent their contextual relations similar to MNE. Our framework outputs a latent space with respect to the input multiplex network. However, formulating a multiplex network using geometric datasets is non-trivial. Unlike network embedding, where the given datasets are often already structured[19, 20], object contexts are subjective and merely implied between objects, despite several directly accessible or computable attributes. Besides, different types of relations measure distance differently, which leads to contradictory layers in a multiplex network. Furthermore, combining various object relations into the layers of a multiplex network is inevitable.

    To address these concerns, we start with commonly used statistics such as co-occurrence and then combine them with advanced and parent-level relations to extract the implicit and high-level relationships between objects, which also aims to resolve the imbalanced dataset problem. We optimize the latent space with gradient descent of the overall loss, a weighted sum of different relations. Our experiment shows that the resulting latent space fully reflects the constraints implied by the given multiplex network. For example, the spatial correlations and co-occurrence between objects in the multiplex network bring similar and complementary objects nearer in the latent space, e.g., an office chair and a desk. The latent contextual space also accords with human intuitions. It preserves the graph properties, which proves beneficial for many applications, including contextual representation such as object classification, contextual exploration, and design of 3D scenes.

    Our contributions in this paper can be summarized as follows.

    • We propose ScenePalette, a framework that permits the exploration and retrieval of objects and their background contexts, enabling intelligent interactions between users and 3D scenes.

    • We encode implicit relations between or among objects with a multiplex network, allowing us to synthesize a contextually rich latent space for 3D objects.

    • We implement a web application for interactive visualization and manipulation of object collections.

    The remainder of our paper is organized as follows. After reviewing the relevant work in Section 2, various ways to formulate relations that compose the multiplex networks are illustrated in Section 3. Then, we illustrate how we generate latent spaces given the formulated relations in Section 4. In Section 5, we implement our method and demonstrate its effectiveness compared with baselines. Besides, two web applications are presented. The web applications utilize the proposed object-embedding method to help efficiently explore furniture objects. In Section 6, we discuss the limitations to our method. Finally, in Section 7, we conclude our method and discuss future directions.

    Context-based modeling and analysis refer to research incorporating contexts between (among) objects in addition to geometry, label, etc. Fisher et al.[21] and Xu et al.[22] retrieved objects according to their contexts, i.e., by considering spatial relations between objects. Several applications of 3D indoor scene synthesis mathematically formulate contexts into priors for guiding arrangements of objects, e.g., [16, 23]. Later, Fisher et al.[24] and Xu et al.[25] contextually organized and compared 3D indoor scenes through encoding contexts as graphs. However, existing literature considers contexts as independent contents of objects. Instead, we explicitly bring contexts back to objects as their attributes, i.e., each object has a unique feature describing its contexts.

    Network embedding captures topological structures of vertices and edges by converting graph nodes to vector-based representations[26]. The studied networks begin with homogeneous unweighted networks[27] and then include attributed weighted networks[12]. Network embedding has been proven to be useful for many real-world problems. In computer vision, the latent embedding of human joints is closer within the same person and further among different people[28]. When comparing shape similarities, [29] orders objects based on their appearances, which was previously referred to as “ordination”[30], but it typically considers visual differences. More recent research explored MNE[13, 31], where the incorporated networks become multiplex networks. DeepWalk[18] is also a network embedding technique, but it does not quantitatively calculate the absence of edges, for example, how quantitatively different the two categories are. Additionally, as experimentally verified in [32], given a weighted network lacking relations between vertices, link prediction performs even worse, which will be further amplified in multiplex networks and imbalanced datasets. Consequently, all the mentioned literature cannot be directly applied to our problem though we also operate on multiplex networks. Specifically, our paper focuses on contexts among objects, which prompts the need to formulate a specific multiplex network and obtain latent spaces correspondingly instead of merely embedding objects.

    3D shape similarity and content-based 3D shape retrieval are classical topics in computer graphics that aim to derive the visual distance between 3D shapes[33]. Geometric approaches such as SHED[29], SPH[34], and Shape Distribution[35] are intuitive ways to achieve this goal. Another line of work uses light field descriptors[11, 36] that render 3D shapes into a series of images, followed by image-based metrics for evaluations. Object collections can also be organized by quadruples, as in [10]. Recently, advances in deep learning have enabled neural networks to extract global[37] or local features[38] of 3D shapes. Essentially, shape similarity can be formulated as a layer in a multiplex network so that contexts are further utilized based on geometries (see Subsection 3.3).

    As discussed in Section 1, non-trivial formulations for multiplex networks are necessary before embedding. This section details how we formulate various relations of 3D objects as constraints and construct a multiplex network accordingly. An overview of our formulation can be found in Fig.2. We use a multiplex network G=(N,E) that has N nodes to represent the given set of objects. Ei, ja indexes the edge between object i and object j at layer a. In other words, it indexes the relation of type a between two objects i and j. Matrix {\boldsymbol{M}}_{a} is used to represent a particular layer a in G . Fig.3 shows the latent spaces of different layers, and the definition of each layer is introduced in the following subsections.

    Figure  2.  Multiplex network formulation (Section 3). Given (a) 3D scenes, (b) we extract potential relations among objects, including basic constraints, advanced constraints, and parent-level constraints, (c) to generate a consistent multiplex network.
    Figure  3.  Investigating different effects of each layer in the multiplex network. Colors distinguish different categories of objects. Note that, the latent space in each subfigure is learned from a particular layer. (a) Category. (b) Co-occurrence. (c) Normalized co-occurrence. (d) Truncated co-occurrence. (e) Exponential co-occurrence. (f) Context-type signature. (g) Tests for CSR. (h) Normalized CSR. (i) Truncated CSR. (j) Exponential CSR. (k) LFD-Zer[11]. (l) LFD-Hu[11]. (m) SHED[29]. (n) PointNet[37]. (o) 500-order fop. (p) Semi-category co-occurrence.

    Category. Each object has a label indicating its category. Objects with the same label are coarsely interchangeable regardless of mislabelling or particular combos that bind several pieces of furniture together. An indicator matrix {\boldsymbol{M}}_{\text{cat}} is defined as (1), where cat(o) denotes the category label of the object o , such as “coffee table” or “double bed”. This metric is a 0-1 metric, which returns 1 if two objects have the same category, e.g., two coffee tables are interchangeable in a living room. It returns 0 if two objects have distinct categories, e.g., a nightstand and a sofa which play different roles in typical home decoration, respectively.

    \begin{aligned} M_{\text{cat}}^{i,\ j}= \begin{cases} 1, & \text{if }\; cat(o_i) = cat(o_j), \\ 0, & \text{otherwise.} \end{cases} \end{aligned} (1)

    Co-occurrence is a widely-used metric[3, 17] that measures how often two objects co-exist in a room. Therefore, co-occurrence is unavailable for datasets without “rooms”, “levels”, or “houses”, e.g., ShapeNet[39]. Similar to [3], we use a matrix {\boldsymbol{M}}_{\text{coo}} to denote co-occurrence, as shown in (2). RM = \{r_k | k = 1, \;2,\; 3,\; \ldots ,\; m\} is a collection of m rooms and each room r_k = \{o_k | k = 1,\; 2, \;3, \;\ldots ,\; |r_k|\} is a set of objects (furniture), where | \cdot | counts the number of elements of a set. Thus, the co-occurrence between an object o_i and an object o_j is the ratio of the number of rooms where o_i and o_j co-exist to the total number of rooms in a dataset. For example, if a dataset contains a hundred rooms where 20 rooms contain both o_i and o_j , then M_{\text{coo}}^{i,\ j} will yield 0.2 in this case.

    \begin{aligned} M_{\text{coo}}^{i,\ j} = \frac{|\{ r | o_i, o_j \in r, r \in RM \}|}{|RM|}. \end{aligned} (2)

    Normalized Co-Occurrence. In practice, all elements of {\boldsymbol{M}}_{\text{coo}} tend to be close to 0 because of a high |RM| . Therefore, we normalize {\boldsymbol{M}}_{\text{coo}} to M_{\text{coon}}^{i,\ j} and M_{\text{coon}}^{i,\ j} = ({M_{\text{coo}}^{i,\ j} - \overline{{\boldsymbol{M}}_{\text{coo}}}})/{S_{\text{coo}}} , where \overline{{\boldsymbol{M}}_{\text{coo}}} is the mean of co-occurrence and S_{\text{coo}} is the standard deviation.

    Truncated Co-Occurrence. To determine whether or not two objects are correlated, we typically truncate the co-occurrence matrix {\boldsymbol{M}}_{\text{coon}} to derive a binary relation {\boldsymbol{M}}_{\text{coot}} , as shown in (3). Our truncation threshold is set to 0.

    \begin{aligned} M_{\text{coot}}^{i,\ j}= \begin{cases} 1, & \text{if } \;M_{\text{coon}}^{i,\ j} \geqslant 0, \\ 0, & \text{otherwise.} \end{cases} \end{aligned} (3)

    Exponential Co-Occurrence. Finally, because the datasets typically follow a “long-tail” distribution[5], we alter relations with extremely high co-occurrence in an exponential manner, i.e., M_{\text{cooe}}^{i,\ j} = 1- \exp(-|RM| M_{\text{coo}}^{i,\ j}) . Therefore, relations with large {\boldsymbol{M}}_{\text{coo}} values will be penalized.

    Context-Type Signature. {\boldsymbol{ct}}_i is a vector of object o_i whose entries are proportional to how many times it occurs in each type of context. As shown in (4), type(r) returns the type of context r , and the relation of context-type is given by the dot product M_{\text{ct}}^{i,\ j} = {\boldsymbol{ct}}_{i} \cdot {\boldsymbol{ct}}_{j} . For example, a king-sized bed \hat{i} only occurs in various master-bedrooms. A nightstand \hat{j} may appear in several kinds of bedrooms besides master-bedrooms (e.g., second-bedrooms, kids-rooms). Finally, another king-sized dining table \hat{k} only appears in dining rooms. M_{\text{ct}}^{\hat{i},\hat{\ j}} is undoubtedly much larger than M_{\text{ct}}^{\hat{i},\ \hat{k}} . M_{\text{ct}}^{\hat{i},\hat{\ k}} is close to 0 since, unlike \hat{i} and \hat{j} , \hat{i} and \hat{k} rarely appear in the same kind of room.

    \begin{aligned} {{ct}}_{i}^{d} = \frac{|\{ r | o_i \in r, type(r)=d \}|}{|\{ r | o_i \in r \}|}, r \in RM. \end{aligned} (4)

    Tests for CSR were proposed in the last century[40] and measure how a series of points are distributed w.r.t a homogeneous Poisson process (planar Poisson process). Tests for CSR are applied in ecology[41], e.g., whether a group of observed plants is located with a specific pattern. Rosin[42] is perhaps the first to introduce CSR in computer vision to detect pixels of white noises in images. Zhang et al.[5] then applied tests for CSR for 3D indoor scene synthesis and measured the strengths of spatial relations.

    Ways to practically calculate tests for CSR are investigated in [40, 43], and in this paper, we qualify spatial chaos between objects as in [44]. Specifically, an inferior angle can be defined for each point (relative positions in this paper) according to the first two nearest relative positions, e.g., \angle QOP , as shown in Fig.4. Thus, under CSR, a complete random position set asymptotically forms an angle set concerning the empirical distribution function[44]. The detailed formulation can be found in (5), where F_{c}(\theta^{i,\ j}) and F_{e}(\theta^{i,\ j}) are the cumulative distribution function of {x}/{{\pi}} for x in (0, \pi) and the empirical distribution function of “angles”, respectively. m is the number of points sampled from the dataset.

    Figure  4.  Testing spatial randomness by angles. Each point can derive an inferior angle using its nearest and the second nearest point. For example, Q and P is the nearest and second nearest point of O, respectively, therefore \angle QOP is the angle formed for O.
    \begin{aligned} M_{\text{csr}}^{i,\ j} = \sqrt{m}\sup|F_{c}(\theta^{i,\ j})-F_{e}(\theta^{i,\ j})|. \end{aligned} (5)

    3D shape similarity measures the visual distance between objects. Previous techniques typically focus only on one measurement aspect, and our study shows that a top-down combination of different metrics can yield better results. We first adopt shape edit distance (SHED)[29] to distinguish objects of significant geometric distinctions. We then employ the light field descriptor (LFD)[11] for more detailed texture measurement, i.e., a visual difference of rendered images. Finally, PointNet[37] is deployed for delicate geometry comparison.

    SHED[29] measures the amount of effort to rearrange the parts of an object to match that of another object, possibly removing or adding parts. We pre-segment objects using the approach proposed by Kaick et al.[45] The segmentation result and part matching of SHED are shown in Fig.5(a).

    Figure  5.  (a) Measuring shape edit distance (SHED) with pre-segmentation and part matching[29]. (b) Rendering a light field[11] of an icosahedron.

    LFD[11] typically renders images according to a light field of a dodecahedron and compares visual similarity at the image-level. In this paper, to increase confidence, as discussed in [11], we use a light field of an icosahedron, as shown in Fig.5(b). In addition to Zernike moments, we also include Hu moment[46] to achieve more visually intuitive results.

    PointNet[37] is a state-of-art architecture that can extract global geometric features of 3D shapes. In this paper, we re-train PointNet on our dataset and use it for extracting descriptors of 3D models. The similarity of objects can thus be defined as the dot product of their descriptors.

    Arbitrary-Order Proximity. According to the graph theory and network embedding[12], the previous relations we discussed in Subsection 3.1 and Subsection 3.2 are all first-order proximity, i.e., relations of objects are computed directly by their contexts instead of indirect relations. An \alpha -th order proximity is formulated as {\boldsymbol{M}}_{\text{aop}} = ({\boldsymbol{M}}_{\text{fop}})^{\alpha} . Similar to finite-state Markov chains[47], high-order proximity can propagate similarities across objects that never occur simultaneously through their mutual neighbors. Because the number of objects in a room can be arbitrary, we iterate {\boldsymbol{M}}_{\text{aop}} for 500 times to fully exploit the potential of the dataset. Fig.3(o) shows a 500 -order normalized co-occurrence.

    In datasets of 3D scenes, occurrences of different furniture are not well-balanced. Taking two pieces of furniture of the same category, similar textures, and indistinguishable geometries as an example, one may appear a thousand times while the other only occurs less than ten times.

    To tackle this imbalance issue, we propose parent-level similarity between datasets, which weighs relations higher than the instance level. We define the parent-level similarity as a matrix {\boldsymbol{M}}_{\text{p}} , as defined in (6), given a base matrix {\boldsymbol{M}}_{\text{base}} and a first-order proximity {\boldsymbol{M}}_{\text{fop}} .

    \begin{aligned} M_{\text{p}}^{i,\ j} = \lambda_{\text{p}} \frac{\displaystyle\sum\limits_{k} {M}_{\text{base}}^{i,\ k} {M}_{\text{fop}}^{k,\ j}}{\displaystyle\sum\limits_{k} {M}_{\text{base}}^{i,\ k}} + (1-\lambda_{\text{p}})M_{\text{fop}}^{i,\ j}. \end{aligned} (6)

    As shown in (6), we first explore a set of related objects indexed using k , with correlations (weights) indicated by a base matrix {\boldsymbol{M}}_{\text{base}} . {\boldsymbol{M}}_{\text{base}} represents the inherent similarity between objects, e.g., category, shape. In this paper, we take the average of {\boldsymbol{M}}_{\text{cat}} and the shape similarity in Subsection 3.2 as the matrix {\boldsymbol{M}}_{\text{base}} . According to the first order proximity, we calculate the weighted summation of the first order proximity based on {\boldsymbol{M}}_{\text{base}} and M_{\text{fop}}^{k,\ j} , and normalize it by a summation \sum_{k} M_{\text{base}}^{i,\ k} . Intuitively, in a particular set of objects, if one object o has an abnormally high occurrence and the rest all have low occurrences, {\boldsymbol{M}}_{\text{base}} and \lambda_{\text{p}} are used to redistribute the first-order proximity M_{\text{fop}}^{k,\ j} of o to others. Fig.3(p) shows parent-level relations using merely {\boldsymbol{M}}_{\text{cat}} as {\boldsymbol{M}}_{\text{base}} .

    After achieving a series of layers, we next try generating latent vectors of objects. The latent space is denoted by V \in \mathbb{R}^{n \times d} , where n is the number of objects and d is the dimension of latent vectors. Initially, we randomly initialize latent vectors of objects using the standard Gaussian distribution. Subsequently, we elaborate on how the loss functions are defined and used to derive the final latent spaces through an optimization process, with the relations established in Section 3.

    \begin{aligned} {\cal{L}} = \sum_{cst \in C} \omega_{cst} \psi_{cst}({\boldsymbol{X}}). \end{aligned} (7)

    (7) shows the overall loss function, a linear combination of losses on each constraint (relation) cst in the constraint set C . C is selected from the relations introduced in Section 3 and the details can be found in Subsection 5.1. The individual loss \psi_{cst}({\boldsymbol{X}}) takes as input the pairwise L2-norm matrix {\boldsymbol{X}} that is generated according to the latent space V :

    \begin{aligned} \label{equ_pairwise} {X}^{i,\ j} = ||{\boldsymbol{v}}_i - {\boldsymbol{v}}_j||, \end{aligned}

    where {\boldsymbol{v}}_i and {\boldsymbol{v}}_j are corresponding latent vectors for object i and j , respectively. For a normalized layer falling in the range (-\infty, \infty) , \psi_{cst}({\boldsymbol{X}}) is defined as \sum_{i,\ j}^{ }X^{i,\ j}M_{cst}^{i,\ j} . For a layer falls in the range \in [0, 1] , \psi_{cst}({\boldsymbol{X}}) is defined as (8) and (9), which follows a “push-pull” metric \phi({\boldsymbol{X}}) . \phi({\boldsymbol{X}}) has two parts: \varphi^{\text{near}}( \cdot ) and \varphi^{\text{far}}( \cdot ) . The reason is that if we only push latent vectors close to each other given a multiplex network, it is easy to imagine that all of them will eventually collapse to a single point.

    \begin{aligned} \psi({\boldsymbol{X}}) &= (1-\lambda_{\text{fn}})\varphi^{\text{near}}({\boldsymbol{X}}) + \lambda_{\text{fn}}\varphi^{\text{far}}({\boldsymbol{X}}) \end{aligned} (8)
    \begin{aligned} & = \sum_{i,\ j} X^{i,\ j} M_{cst}^{i,\ j} + \lambda_{\text{fn}}\exp\frac{-(X^{i,\ j}(1 - M_{cst}^{i,\ j}))^{2}}{2\sigma^{2}}, \end{aligned} (9)

    where \varphi^{\text{near}}({\boldsymbol{X}})=\sum_{i,j} X^{i,\ j} M_{cst}^{i,\ j} pushes two objects closer concerning a given matrix {\boldsymbol{M}}_{cst} , i.e., if o_i and o_j are close to each other as described in {\boldsymbol{M}}_{cst} , they cannot have a large norm X^{i,\ j} . Therefore, the gradient of \varphi^{\text{near}}({\boldsymbol{X}}) will be large if o_i and o_j have a significant distance and vice versa. On the other hand, the second term \varphi^{\text{far}}({\boldsymbol{X}}) pulls objects apart. Analogous to the Gaussian distribution, the longer the distances between objects are, the smaller the exponential \exp({ \cdot }/({2\sigma^{2}})) will be. Specifically, we use the gradient of \varphi^{\text{far}}({\boldsymbol{X}}) to penalize those objects that wrongly become close to each other. For example, if o_i and o_j are far away, according to {\boldsymbol{M}}_{cst} , then the gradient will be prominent in the case of a small norm X^{i,\ j} . An additional hyperparameter \lambda_{\text{fn}} is used to balance \varphi^{\text{near}} and \varphi^{\text{far}} , with greater \lambda_{\text{fn}} leading to “sparser” latent spaces.

    The contribution of a constraint cst to the final latent space is controlled by its weight \omega_{cst} . In multiplex network embedding, \omega_{cst} is often called “interplay”[31] or “cross-layer dependency”[48]. Since the weights are not directly available in the geometric datasets, we treat \omega_{cst} as a hyperparameter.

    Datasets. We utilize 3D-Front[2], which contains 9\;992 usable 3D models and more than 70\;000 scene configurations. Therefore, abundant object contexts are available for formulating multiplex networks (Section 3). Additionally, we craft more CAD models, such as televisions, rugs, and laptops, which are not included in 3D-Front. As shown in Fig.6, a platform is also developed to support user-friendly interaction of objects, such as translation, rotation, and insertion. We use this platform to create more contexts of objects, e.g., typical bedrooms and living rooms. The following experiments will show more applications of this platform with the optimized latent spaces.

    Figure  6.  (a) 3D scene platform developed for evaluating latent spaces. (b) It allows the creation of more contexts among objects with object transformation defined by users. (c) It supports other potential applications, as in Subsection 5.3.

    Implementation Details. The first hyperparameter we need to determine is the dimension d . Similar to network embedding, choices of d depend on what kinds of manifold we desire[12]. Fig.7 shows the influence of d on the final result. We use t-SNE[49] for dimension reduction. The threshold of normalized relations is chosen as 0 to distinguish positive and negative effects. We use a learning rate of 0.001 . \lambda_{\text{p}} in (6) and \lambda_{\text{fn}} in \phi( \cdot ) are both tuned to 0.5 empirically. For constraints, we choose normalized co-occurrence, parent-level relations, context-type signature, categories, and exponential tests for CSR, which cover the proposed layouts formulating the multiplex network. Note that the layers of the shape similarities are used in the parent-level relations since we focus on embedding contexts and avoid directly using shapes as layers. The maximum number of iterations is empirically set to 50 since more iterations do not experimentally optimize the space further. Fig.8 shows an iterative process of generating a particular latent space under a multiplex network. We utilize a GTX-970 GPU for both network formulation and latent space optimization. It takes 10 seconds to optimize a 16 -dimensional latent space with a 6 -layer multiplex network.

    Figure  7.  Latent spaces with dimensions d = 2, 8, 16 for (a), (b) and (c), respectively, using a same multiplex network. The results show that our method can capture as many contextual representations as possible while robust to dimension d .
    Figure  8.  Generating a latent space from (a) a random Gaussian initialization. As the optimization process proceeds, the latent space is updated until the overall loss is sufficiently small or it reaches the maximum number of iterations. Note that colors denote types of objects, where we can see the gradual changes of types with respect to room types. The remaining subfigures refer to (b) 10, (c) 20, (d) 30, (e) 50 and (f) 100 iterations, respectively.

    Results are shown in Fig.8. Note that latent spaces are non-unique and can vary with different multiplex networks and selections of \omega_{cst} . Since a latent space contains thousands of objects, we show two subspaces of the whole latent space in Fig.9. Given an object, several contextually related objects are rendered based on their distances to the given object. We also develop a web platform that enables users to explore and retrieve objects contextually (see Fig.6).

    Figure  9.  Glimpse of the embedded results of different objects. (a) Retrieved objects from a double bed. (b) Retrieved objects from a coffee table. A high-resolution version of the entire latent space is included in the link in Subsection 5.1.

    The overall rendering of the space is shown in the link below 1.

    To verify the effectiveness of the optimized latent space, we design a classification task that aims to classify objects based on the learned latent vectors. We separate our dataset into a training set and a testing set, which contain 80% and 20% data, respectively.

    Three methods are compared: classification with only geometric descriptors of objects, classification with only latent vectors in this paper, and combining both. In terms of geometric descriptors, we leverage PointNet, which learns global 1024-dimension features of objects. As for the latent vectors, we deliberately exclude shape similarity and category mentioned in Subsection 3.1 to eliminate their influences on the classification task. We then feed the latent vectors into a multilayer perceptron. For the combination method, geometric and contextual features are first trained and then concatenated together before passing to the classifier. Fig.10 illustrates the differences between the three methods.

    Figure  10.  Classifying objects using geometric features (green), contextual features (orange), and a combination of them. Note that for the combination, we tile contextual features to ensure the features have the same length as geometric features.

    As shown in Fig.11, classification with contextual features achieves higher accuracy than with geometric features, which are even extracted with more complex architectures[37]. Moreover, the combination of both features further improves the accuracy. Therefore, ScenePalette constructs meaningful latent spaces to identify objects.

    Figure  11.  Results of the classification task with (a) geometric features, (b) contextual features and (c) their combination, where the red lines, green lines, blue lines and black lines denote training accuracy, training loss, evaluation accuracy and evaluation loss, respectively.

    One exclusive feature of ScenePalette is that it provides a continuous representation of contexts among objects, which paves the way for many applications of 3D objects. In this subsection, we show two applications on our platform (see Fig.6). Both are to retrieve and explore object collections contextually.

    First, as shown in Fig.12, a palette is placed on the left of an existing 3D scene for retrieving objects according to the current context. The palette is visualized as a force-directed graph layout, where the placement of elements is consistent with that of the latent space, which reflects contextual relations. Each time a user clicks an object (a single context of an object) or a room (multiple contexts of objects), a palette is expanded to show the correlated contextual results with respect to the current context. Since a latent space includes all objects in a dataset, each palette expansion will show up to \iota objects. In this paper, we empirically set \iota = 20 , since we do not want too many or too few choices presented to users. A latent space is rendered simultaneously when a palette is expanded. A user may left-click elements on the palette and add the selected element to the 3D scene. She/he can also drag the palette to explore more weakly correlated elements.

    Figure  12.  “Drawing” 3D scenes. (a) (b) Clicking different objects or rooms adjusts the palette on the left. (c) Users can explore the palette by dragging and zooming. They can insert an element from the palette into scenes. A video is attached to demonstrate “drawing” 3D scenes.

    We also combine ScenePalette with MageAdd[7], a state-of-the-art real-time interactive framework for 3D scene synthesis. MageAdd automatically suggests different categories of suitable objects nearby the mouse cursor, but it does not allow users to select related objects. For example, clutterpalette[50] allows users to choose objects after each mouse click, though clutterpalette is designed only for small objects that enhance scene details. Therefore, in our application, users can first choose an object on the palette based on the optimized contextual latent space. Then the system will follow the priors of MageAdd and transform the selected object according to the position of the mouse cursor. In other words, ScenePalatte enables intelligent user interactions. Users can select objects based on their preferences while the system still automatically controls the transformations of selected objects. Thus, contextual exploration of objects can be integrated to assist interactive scene synthesis. A video is provided in the link below 2 to demonstrate this application.

    Second, as shown in Fig.13, we also embed the palette into 3D scenes where an object or a set of objects are already selected. After the user clicks an object, the palette will spread around the selected object, and the original scene is temporally hidden. By clicking elements in the palette, more related elements are rendered based on the entire latent space. After selecting an object in the latent space, the object will be inserted back into the scene based on the Room Depth model[7], which learns the relations between objects and room shapes.

    Figure  13.  After (a) selecting an object, the scene is transformed to (b) a latent space where the selected object is centered and surrounded by correlated contextual objects. (c) The context is refined by clicking other objects, and more correlated objects appear. (d) An object of interest may be selected and (e) the scene is switched to the previous one where the new object is inserted.

    There are still some limitations to our paper.

    First, as a data-driven framework, our method struggles to map 3D objects to latent spaces like human intuitions. For example, two similar nightstands have the same distance to a suitable double bed from a human perspective. However, due to the inherent bias in the dataset, it is not the case in our results since the two nightstands may appear at different times and have varied relations with other objects. Although the proposed parent-level relations exist, this problem is still not fully alleviated.

    Second, the dataset to latent space is a one-to-many mapping because the weights of constraints in Section 3 are adjustable and users prefer 3D layouts. For example, given a double bed, some users may strongly favor the palette to merely render directly related objects such as nightstands instead of objects such as desks or wardrobes. Also, the priority of objects is changed during interactions, e.g., the importance of double beds and nightstands will be lower after they are selected since the bedroom already has a “double bed set”. Consequently, personalized and input-sensitive latent spaces will also be explored.

    In this paper, we proposed a novel framework for extracting contexts of 3D models as multiplex networks and generating latent contextual representations for objects. Our experiment demonstrates that latent contextual spaces are compatible with traditional object descriptors and practical to various real-world applications. Therefore, generating latent contextual spaces of 3D object collections is a promising direction for research on 3D scenes, e.g., contextually exploring objects for 3D scenes. Additionally, as discussed in Section 6, several limitations can also be improved.

  • Figure  1.   Latent contextual space. The three big circles refer to three different types of rooms. The small circles refer to different types of objects. An exploratory path exists from bedroom to living room and to kitchen, in which multiple objects are suitable for more than one type of room in the overlapped area.

    Figure  2.   Multiplex network formulation (Section 3). Given (a) 3D scenes, (b) we extract potential relations among objects, including basic constraints, advanced constraints, and parent-level constraints, (c) to generate a consistent multiplex network.

    Figure  3.   Investigating different effects of each layer in the multiplex network. Colors distinguish different categories of objects. Note that, the latent space in each subfigure is learned from a particular layer. (a) Category. (b) Co-occurrence. (c) Normalized co-occurrence. (d) Truncated co-occurrence. (e) Exponential co-occurrence. (f) Context-type signature. (g) Tests for CSR. (h) Normalized CSR. (i) Truncated CSR. (j) Exponential CSR. (k) LFD-Zer[11]. (l) LFD-Hu[11]. (m) SHED[29]. (n) PointNet[37]. (o) 500-order fop. (p) Semi-category co-occurrence.

    Figure  4.   Testing spatial randomness by angles. Each point can derive an inferior angle using its nearest and the second nearest point. For example, Q and P is the nearest and second nearest point of O, respectively, therefore \angle QOP is the angle formed for O.

    Figure  5.   (a) Measuring shape edit distance (SHED) with pre-segmentation and part matching[29]. (b) Rendering a light field[11] of an icosahedron.

    Figure  6.   (a) 3D scene platform developed for evaluating latent spaces. (b) It allows the creation of more contexts among objects with object transformation defined by users. (c) It supports other potential applications, as in Subsection 5.3.

    Figure  7.   Latent spaces with dimensions d = 2, 8, 16 for (a), (b) and (c), respectively, using a same multiplex network. The results show that our method can capture as many contextual representations as possible while robust to dimension d .

    Figure  8.   Generating a latent space from (a) a random Gaussian initialization. As the optimization process proceeds, the latent space is updated until the overall loss is sufficiently small or it reaches the maximum number of iterations. Note that colors denote types of objects, where we can see the gradual changes of types with respect to room types. The remaining subfigures refer to (b) 10, (c) 20, (d) 30, (e) 50 and (f) 100 iterations, respectively.

    Figure  9.   Glimpse of the embedded results of different objects. (a) Retrieved objects from a double bed. (b) Retrieved objects from a coffee table. A high-resolution version of the entire latent space is included in the link in Subsection 5.1.

    Figure  10.   Classifying objects using geometric features (green), contextual features (orange), and a combination of them. Note that for the combination, we tile contextual features to ensure the features have the same length as geometric features.

    Figure  11.   Results of the classification task with (a) geometric features, (b) contextual features and (c) their combination, where the red lines, green lines, blue lines and black lines denote training accuracy, training loss, evaluation accuracy and evaluation loss, respectively.

    Figure  12.   “Drawing” 3D scenes. (a) (b) Clicking different objects or rooms adjusts the palette on the left. (c) Users can explore the palette by dragging and zooming. They can insert an element from the palette into scenes. A video is attached to demonstrate “drawing” 3D scenes.

    Figure  13.   After (a) selecting an object, the scene is transformed to (b) a latent space where the selected object is centered and surrounded by correlated contextual objects. (c) The context is refined by clicking other objects, and more correlated objects appear. (d) An object of interest may be selected and (e) the scene is switched to the previous one where the new object is inserted.

  • [1]

    Song S, Yu F, Zeng A, Chang A X, Savva M, Funkhouser T. Semantic scene completion from a single depth image. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.190–198. DOI: 10.1109/CVPR.2017.28.

    [2]

    Fu H, Cai B, Gao L, Zhang L X, Wang J, Li C, Zeng Q, Sun C, Jia R, Zhao B, Zhang H. 3D-FRONT: 3D furnished rooms with layOuts and semaNTics. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.10913–10922. DOI: 10.1109/ICCV48922.2021.01075.

    [3]

    Fu Q, Chen X, Wang X, Wen S, Zhou B, Fu H. Adaptive synthesis of indoor scenes via activity-associated object relation graphs. ACM Trans. Graphics, 2017, 36(6): Article No. 201. DOI: 10.1145/3130800.3130805.

    [4]

    Zhang S H, Zhang S K, Liang Y, Hall P. A survey of 3D indoor scene synthesis. Journal of Computer Science and Technology, 2019, 34(3): 594–608. DOI: 10.1007/s11390-019-1929-5.

    [5]

    Zhang S H, Zhang S K, Xie W Y, Luo C Y, Yang Y L, Fu H. Fast 3D indoor scene synthesis by learning spatial relation priors of objects. IEEE Trans. Visualization and Computer Graphics, 2022, 28(9): 3082–3092. DOI: 10.1109/TVCG.2021.3050143.

    [6]

    Yan M, Chen X, Zhou J. An interactive system for efficient 3D furniture arrangement. In Proc. the 2017 Computer Graphics International Conference, Jun. 2017, Article No. 29. DOI: 10.1145/3095140.3095169.

    [7]

    Zhang S K, Li Y X, He Y, Yang Y L, Zhang S H. MageAdd: Real-time interaction simulation for scene synthesis. In Proc. the 29th ACM International Conference on Multimedia, Oct. 2021, pp.965–973. DOI: 10.1145/3474085.3475194.

    [8]

    Handa A, Patraucean V, Badrinarayanan V, Stent S, Cipolla R. Understanding real world indoor scenes with synthetic data. In Proc. the 2016 IEEE Conference on Computer Vision, Jun. 2016, pp.4077–4085. DOI: 10.1109/CVPR.2016.442.

    [9]

    Luo A, Zhang Z, Wu J, Tenenbaum J B. End-to-end optimization of scene layout. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.3753–3762. DOI: 10.1109/CVPR42600.2020.00381.

    [10]

    Huang S S, Shamir A, Shen C H, Zhang H, Sheffer A, Hu S M, Cohen-Or D. Qualitative organization of collections of shapes via quartet analysis. ACM Trans. Graphics, 2013, 32(4): Article No. 71. DOI: 10.1145/2461912.2461954.

    [11]

    Chen D Y, Tian X P, Shen Y T, Ouhyoung M. On visual similarity based 3D model retrieval. Computer Graphics Forum, 2003, 22(3): 223–232. DOI: 10.1111/1467-8659.00669.

    [12]

    Cai H Y, Zheng V W, Chang K C C. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans. Knowledge and Data Engineering, 2018, 30(9): 1616–1637. DOI: 10.1109/TKDE.2018.2807452.

    [13]

    Zhang H, Qiu L, Yi L, Song Y. Scalable multiplex network embedding. In Proc. the 27th International Joint Conference on Artificial Intelligence, Jul. 2018, pp.3082–3088. DOI: 10.5555/3304889.3305089.

    [14]

    Zhang S K, Xie W Y, Zhang S H. Geometry-based layout generation with hyper-relations AMONG objects. Graphical Models, 2021, 116: 101104. DOI: 10.1016/j.gmod.2021.101104.

    [15]

    He Y, Shen Z, Cui P. Towards Non-I. I. D. image classification: A dataset and baselines. Pattern Recognition, 2021, 110: 107383. DOI: 10.1016/j.patcog.2020.107383.

    [16]

    Yu L F, Yeung S K, Tang C K, Terzopoulos D, Chan T F, Osher S. Make it home: Automatic optimization of furniture arrangement. ACM Trans. Graphics, 2011, 30(4): 86. DOI: 10.1145/2010324.1964981.

    [17]

    Chang A, Savva M, Manning C D. Learning spatial knowledge for text to 3D scene generation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, Oct. 2014, pp.2028–2038. DOI: 10.3115/v1/D14-1217.

    [18]

    Perozzi B, Al-Rfou R, Skiena S. DeepWalk: Online learning of social representations. In Proc. the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2014, pp.701–710. DOI: 10.1145/2623330.2623732.

    [19]

    He R, McAuley J. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proc. the 25th International Conference on World Wide Web, Apr. 2016, pp.507–517. DOI: 10.1145/2872427.2883037.

    [20]

    Tang L, Wang X, Liu H. Uncoverning groups via heterogeneous interaction analysis. In Proc. the 9th IEEE International Conference on Data Mining, Dec. 2009, pp.503–512. DOI: 10.1109/ICDM.2009.20.

    [21]

    Fisher M, Hanrahan P. Context-based search for 3D models. ACM Trans. Graphics, 2010, 29(6): Article No. 182. DOI: 10.1145/1882261.1866204.

    [22]

    Xu K, Chen K, Fu H, Sun W L, Hu S M. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Trans. Graphics, 2013, 32(4): Article No. 123. DOI: 10.1145/2461912.2461968.

    [23]

    Weiss T, Litteneker A, Duncan N, Nakada M, Jiang C, Yu L F, Terzopoulos D. Fast and scalable position-based layout synthesis. IEEE Trans. Visualization and Computer Graphics, 2019, 25(12): 3231–3243. DOI: 10.1109/TVCG.2018.2866436.

    [24]

    Fisher M, Savva M, Hanrahan P. Characterizing structural relationships in scenes using graph kernels. ACM Trans. Graphics, 2011, 30(4): Article No. 34. DOI: 10.1145/2010324.1964929.

    [25]

    Xu K, Ma R, Zhang H, Zhu C, Shamir A, Cohen-Or D, Huang H. Organizing heterogeneous scene collections through contextual focal points. ACM Trans. Graphics, 2014, 33(4): Article No. 35. DOI: 10.1145/2601097.2601109.

    [26]

    Cui P, Wang X, Pei J, Zhu W. A survey on network embedding. IEEE Trans. Knowledge and Data Engineering, 2019, 31(5): 833–852. DOI: 10.1109/TKDE.2018.2849727.

    [27]

    Wang X, Cui P, Wang J, Pei J, Zhu W, Yang S. Community preserving network embedding. In Proc. the 31st AAAI Conference on Artificial Intelligence, Nov. 2017, pp.203–209. DOI: 10.1145/3357384.3357947.

    [28]

    Newell A, Huang Z, Deng J. Associative embedding: End-to-end learning for joint detection and grouping. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.2274–2284.

    [29]

    Kleiman Y, van Kaick O, Sorkine-Hornung O, Cohen-Or D. SHED: Shape edit distance for fine-grained shape similarity. ACM Trans. Graphics, 2015, 34(6): Article No. 235. DOI: 10.1145/2816795.2818116.

    [30]

    Kohonen T. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 1982, 43(1): 59–69. DOI: 10.5555/65669.104428.

    [31]

    Liu W, Chen P Y, Yeung S, Suzumura T, Chen L. Principled multilayer network embedding. In Proc. the 2017 IEEE International Conference on Data Mining Workshops, Nov. 2017, pp.134–141. DOI: 10.1109/ICDMW.2017.23.

    [32]

    De Sá H R, Prudêncio R B C. Supervised link prediction in weighted networks. In Proc. the 2011 International Joint Conference on Neural Networks, Sept. 2011, pp.2281–2288. DOI: 10.1109/IJCNN.2011.6033513.

    [33]

    Tangelder J W H, Veltkamp R C. A survey of content based 3D shape retrieval methods. Multimedia Tools and Applications, 2008, 39(3): 441–471. DOI: 10.1007/s11042-007-0181-0.

    [34]

    Kazhdan M, Funkhouser T, Rusinkiewicz S. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Proc. the 2003 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, Jun. 2003, pp.156–164. DOI: 10.5555/882370.882392.

    [35]

    Osada R, Funkhouser T, Chazelle B, Dobkin D. Shape distributions. ACM Trans. Graphics, 2002, 21(4): 807–832. DOI: 10.1145/571647.571648.

    [36]

    Shilane P, Min P, Kazhdan M, Funkhouser T. The princeton shape benchmark. In Proc. the 2004 Shape Modeling Applications, Jun. 2004, pp.167–178. DOI: 10.1109/SMI.2004.1314504.

    [37]

    Charles R Q, Su H, Kaichun M, Guibas L J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.77–85. DOI: 10.1109/CVPR.2017.16.

    [38]

    Zeng A, Song S, Nießner M, Fisher M, Xiao J, Funkhouser T. 3DMatch: Learning local geometric descriptors from RGB-D reconstructions. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.199–208. DOI: 1 0.1109/CVPR.2017.29.

    [39]

    Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F. ShapeNet: An information-rich 3D model repository. arXiv: 1512.03012, 2015. https://arxiv.org/abs/1512.03012, Sept. 2024.

    [40]

    Diggle P J, Besag J, Gleaves J T. Statistical analysis of spatial point patterns by means of distance methods. Biometrics, 1976, 32(3): 659–667. DOI: 10.2307/2529754.

    [41]

    Gignoux J, Duby C, Barot S. Comparing the performances of Diggle’s tests of spatial randomness for small samples with and without edge-effect correction: Application to ecological data. Biometrics, 1999, 55(1): 156–164. DOI: 10.1111/j.0006-341x.1999.00156.x.

    [42]

    Rosin P. Thresholding for change detection. In Proc. the 6th International Conference on Computer Vision, Jan. 1998, pp.274–279. DOI: 10.1109/ICCV.1998.710730.

    [43]

    Diggle P J. On parameter estimation and goodness-of-fit testing for spatial point patterns. Biometrics, 1979, 35(1): 87–101. DOI: 10.2307/2529938.

    [44]

    Assunção R. Testing spatial randomness by means of angles. Biometrics, 1994, 50(2): 531–537. DOI: 10.2307/2533397.

    [45]

    van Kaick O, Fish N, Kleiman Y, Asafi S, Cohen-Or D. Shape segmentation by approximate convexity analysis. ACM Trans. Graphics, 2014, 34(1): Article No. 4. DOI: 10.1145/2611811.

    [46]

    Hu M K. Visual pattern recognition by moment invariants. IRE Trans. Information Theory, 1962, 8(2): 179–187. DOI: 10.1109/TIT.1962.1057692.

    [47]

    Gallager R G. Stochastic Processes: Theory for Applications. Cambridge University Press, 2013.

    [48]

    Li J, Chen C, Tong H, Liu H. Multi-layered network embedding. In Proc. the 2018 SIAM International Conference on Data Mining, May 2018, pp.684–692. DOI: 10.1137/1.9781611975321.77.

    [49]

    van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(86): 2579–2605.

    [50]

    Yu L F, Yeung S K, Terzopoulos D. The clutterpalette: An interactive tool for detailing indoor scenes. IEEE Trans. Visualization and Computer Graphics, 2016, 22(2): 1138–1148. DOI: 10.1109/TVCG.2015.2417575.

  • 其他相关附件

图(13)
计量
  • 文章访问数:  240
  • HTML全文浏览量:  6
  • PDF下载量:  31
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-02-07
  • 录用日期:  2022-12-23
  • 网络出版日期:  2023-06-20
  • 刊出日期:  2024-10-30

目录

/

返回文章
返回